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Abstract: Reasoning about knowledge seems to play a fundamental role in distributed 
systems. Indeed, such reasoning is a central part of the informal intuitive arguments used 
in the design of distributed protocols. Communication in a distributed system can be 
viewed as the act of transforming the system's state of knowledge. This paper presents 
a general framework for formalizing and reasoning about knowledge in distributed sys- 
tems. We argue that states of knowledge of groups of processors are useful concepts 
for the design and analysis of distributed protocols. In particular, distributed knowledge 
corresponds to knowledge that is "distributed" among the members of the group, while 
common knowledge corresponds to a fact being "publicly known". The relationship be- 
tween common knowledge and a variety of desirable actions in a distributed system is 
illustrated. Furthermore, it is shown that, formally speaking, in practical systems com- 
mon knowledge cannot be attained. A number of weaker variants of common knowledge 
that are attainable in many cases of interest are introduced and investigated. 
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1 Introduction 



Distributed systems of computers are rapidly gaining popularity in a wide variety of 
applications. However, the distributed nature of control and information in such systems 
makes the design and analysis of distributed protocols and plans a complex task. In 
fact, at the current time, these tasks are more an art than a science. Basic foundations, 
general techniques, and a clear methodology are needed to improve our understanding 
and ability to deal effectively with distributed systems. 

While the tasks that distributed systems are required to perform are normally stated 
in terms of the global behavior of the system, the actions that a processor performs 
can depend only on its local information. Since the design of a distributed protocol 
involves determining the behavior and interaction between individual processors in the 
system, designers frequently find it useful to reason intuitively about processors' "states of 
knowledge" at various points in the execution of a protocol. For example, it is customary 
to argue that ". . . once the sender receives the acknowledgement, it knows that the 
current packet has been delivered; it can then safely discard the current packet, and send 
the next packet...". Ironically, however, formal descriptions of distributed protocols, 
as well as actual proofs of their correctness or impossibility, have traditionally avoided 
any explicit mention of knowledge. Rather, the intuitive arguments about the state of 
knowledge of components of the system are customarily buried in combinatorial proofs 
that are unintuitive and hard to follow. 

The general concept of knowledge has received considerable attention in a variety of 
fields, ranging from Philosophy ||Hin62|| and Artificial Intelligence [ MSHI79 1 and | Mo 08 5 



to Game Theory | |Aum76|| and Psychology [UM81]. The main purpose of this paper is 
to demonstrate the relevance of reasoning about knowledge to distributed systems as 
well. Our basic thesis is that explicitly reasoning about the states of knowledge of the 
components of a distributed system provides a more general and uniform setting that 
offers insight into the basic structure and limitations of protocols in a given system. 

As mentioned above, agents can only base their actions on their local information. 
This knowledge, in turn, depends on the messages they receive and the events they 
observe. Thus, there is a close relationship between knowledge and action in a distributed 
environment. When we consider the task of performing coordinated actions among a 
number of agents in a distributed environment, it does not, in general, suffice to talk 
only about individual agents' knowledge. Rather, we need to look at states of knowledge 
of groups of agents (the group of all participating agents is often the most relevant one to 
consider). Attaining particular states of group knowledge is a prerequisite for performing 
coordinated actions of various kinds. 

In this work we define a hierarchy of states of group knowledge. It is natural to 
think of communication in the system as the act of improving the state of knowledge, 
in the sense of "climbing up the hierarchy" . The weakest state of knowledge we discuss 
is distributed knowledge, which corresponds to knowledge that is distributed among the 
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members of the group, without any individual agent necessarily having it.Q The strongest 
state of knowledge in the hierarchy is common knowledge, which roughly corresponds to 
"public knowledge" . We show that the execution of simultaneous actions becomes com- 
mon knowledge, and hence that such actions cannot be performed if common knowledge 
cannot be attained. Reaching agreement is an important example of a desirable simulta- 
neous action in a distributed environment. A large part of the technical analysis in this 
paper is concerned with the ability and cost of attaining common knowledge in systems 
of various types. It turns out that attaining common knowledge in distributed environ- 
ments is not a simple task. We show that when communication is not guaranteed it is 
impossible to attain common knowledge. This generalizes the impossibility of a solution 



to the well-known coordinated attack problem ||Gra78|| . A more careful analysis shows 
that common knowledge can only be attained in systems that support simultaneous co- 
ordinated actions. It can be shown that such actions cannot be guaranteed or detected in 
practical distributed systems. It follows that common knowledge cannot be attained in 
many cases of interest. We then consider states of knowledge that correspond to eventu- 
ally coordinated actions and to coordinated actions that are guaranteed to be performed 
within a bounded amount of time. These are essentially weaker variants of common 
knowledge. However, whereas, strictly speaking, common knowledge may be difficult to 
attain in many practical cases, these weaker states of knowledge are attainable in cases 
of interest. 

Another question that we consider is that of when it is safe to assume that certain 
facts are common knowledge, even when strictly speaking they are not. For this purpose, 
we introduce the concept of internal knowledge consistency. Roughly speaking, it is 
internally knowledge consistent to assume that a certain state of knowledge holds at a 
given point, if nothing the processors in the system will ever encounter will be inconsistent 
with this assumption. 

The rest of the paper is organized as follows. In the next section we look at the 
"muddy children" puzzle, which illustrates some of the subtleties involved in reason- 
ing about knowledge in the context of a group of agents. In Section 3 we introduce a 
hierarchy of states of knowledge in which a group may be. Section 4 focuses on the re- 
lationship between knowledge and communication by looking at the coordinated attack 
problem. In Section 5 we sketch a general definition of a distributed system, and in 
Section 6 we discuss how knowledge can be ascribed to processors in such systems so as 
to make statements such as "agent 1 knows (p" completely formal and precise. Section 7 
relates common knowledge to the coordinated attack problem. In Section 8, we show 
that, strictly speaking, common knowledge cannot be attained in practical distributed 
systems. Section 9 considers the implications of this observation and in Section 10 we 
begin to reconsider the notion of common knowledge in the light of these implications. 
In Sections 11 and 12, we consider a number of variants of common knowledge that are 



1 In a previous version of this paper [BM9C], what we are now calling distributed knowledge was 
called implicit knowledge. We have changed the name here to avoid conflict with the usage of the phrase 
"implicit knowledge" in papers such as [FH88, Lcv84|. 
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attainable in many cases of interest and discuss the relevance of these states of knowl- 
edge to the actions that can be performed in a distributed system. Section [13] discusses 
the notion of internal knowledge consistency, and Section ^ contains some concluding 
remarks. 



2 The muddy children puzzle 

A crucial aspect of distributed protocols is the fact that a number of different processors 
cooperate in order to achieve a particular goal. In such cases, since more than one agent 
is present, an agent may have knowledge about other agents' knowledge in addition to 
his knowledge about the physical world. This often requires care in distinguishing subtle 
differences between seemingly similar states of knowledge. A classical example of this 
phenomenon is the muddy children puzzle - a variant of the well known "wise men" or 



"cheating wives" puzzles. The version given here is taken from [Bar81|: 



Imagine n children playing together. The mother of these children has told 
them that if they get dirty there will be severe consequences. So, of course, 
each child wants to keep clean, but each would love to see the others get dirty. 
Now it happens during their play that some of the children, say k of them, 
get mud on their foreheads. Each can see the mud on others but not on his 
own forehead. So, of course, no one says a thing. Along comes the father, 
who says, "At least one of you has mud on your head," thus expressing a 
fact known to each of them before he spoke (if k > 1). The father then asks 
the following question, over and over: "Can any of you prove you have mud 
on your head?" Assuming that all the children are perceptive, intelligent, 
truthful, and that they answer simultaneously, what will happen? 



The reader may want to think about the situation before reading the rest of Barwise's 
discussion: 



There is a "proof" that the first k — 1 times he asks the question, they will 
all say "no" but then the kth time the dirty children will answer "yes." 

The "proof" is by induction on k. For k = 1 the result is obvious: the dirty 
child sees that no one else is muddy, so he must be the muddy one. Let us do 
k = 2. So there are just two dirty children, a and b. Each answers "no" the 
first time, because of the mud on the other. But, when b says "no," a realizes 
that he must be muddy, for otherwise b would have known the mud was on 
his head and answered "yes" the first time. Thus a answers "yes" the second 
time. But b goes through the same reasoning. Now suppose k = 3; so there 
are three dirty children, a, b, c. Child a argues as follows. Assume I don't 
have mud on my head. Then, by the k = 2 case, both b and c will answer 
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"yes" the second time. When they don't, he realizes that the assumption 
was false, that he is muddy, and so will answer "yes" on the third question. 
Similarly for b and c. [The general case is similar.] 

Let us denote the fact "At least one child has a muddy forehead" by m. Notice that 
if k > 1, i.e., more than one child has a muddy forehead, then every child can see at least 
one muddy forehead, and the children initially all know m. Thus, it would seem, the 
father does not need to tell the children that m holds when k > 1. But this is false! In 
fact, had the father not announced m, the muddy children would never have been able 
to conclude that their foreheads are muddy. We now sketch a proof of this fact. 

First of all, given that the children are intelligent and truthful, a child with a clean 
forehead will never answer "yes" to any of the father's questions. Thus, if k — 0, all of 
the children answer all of the father's questions "no". Assume inductively that if there 
are exactly k muddy children and the father does not announce m, then the children 
all answer "no" to all of the father's questions. Note that, in particular, when there are 
exactly k muddy foreheads, a child with a clean forehead initially sees k muddy foreheads 
and hears all of the father's questions answered "no" . Now assume that there are exactly 
k + 1 muddy children. Let q > 1 and assume that all of the children answer "no" to the 
father's first q — 1 questions. We have argued above that a clean child will necessarily 
answer "no" to the father's q th question. Next observe that before answering the father's 
q th question, a muddy child has exactly the same information as a clean child has at the 
corresponding point in the case of k muddy foreheads. It follows that the muddy children 
must all answer "no" to the father's q th question, and we are done. (A very similar proof 
shows that if there are k muddy children and the father does announce m, his first k — 1 
questions are answered "no".) 

So, by announcing something that the children all know, the father somehow manages 
to give the children useful information! How can this be? Exactly what was the role of 
the father's statement? In order to answer this question, we need to take a closer look 
at knowledge in the presence of more than one knower; this is the subject of the next 
section. 

3 A hierarchy of states of knowledge 

In order to analyze the muddy children puzzle introduced in the previous section, we 
need to consider states of knowledge of groups of agents. As we shall see in the sequel, 
reasoning about such states of knowledge is crucial in the context of distributed systems 
as well. In Section 6 we shall carefully define what it means for an agent i to know a 
given fact f (which we denote by K^). For now, however, we need knowledge to satisfy 
only two properties. The first is that an agent's knowledge at a given time must depend 
only on its local history: the information that it started out with combined with the 
events it has observed since then. Secondly, we require that only true things be known, 
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or more formally: 

K& D ip; 

i.e., if an agent i knows ip, then <p is true. This property, which is occasionally referred 
to as the knowledge axiom, is the main property that philosophers customarily use to 



distinguish knowledge from belief (cf. ||HM92|| ). 



Given a reasonable interpretation for what it means for an agent to know a fact tp, 
how does the notion of knowledge generalize from an agent to a group? In other words, 
what does it mean to say that a group G of agents knows a fact <pl We believe that 
more than one possibility is reasonable, with the appropriate choice depending on the 
application: 



• D G ip (read "the group G has distributed knowledge of </?"): We say that knowledge 
of ip is distributed in G if someone who knew everything that each member of G 
knows would know ip. For instance, if one member of G knows ip and another knows 
that ip D <p, the group G may be said to have distributed knowledge of ip. 

• S G p (read "someone in G knows (p v ): We say that S G <p holds iff some member 
of G knows (p. More formally, 

Sg¥ = V K if' 

ieG 



E G ip (read 11 everyone in G knows ip"): We say that E G p> holds iff all members of 
G know <p. More formally, 

E G p = f\ K { p. 

ieG 

E G ip, for k > 1 (read "ip is E k -knowledge in G"): E G p is defined by 

E G <p = E G tp, 

E^cp = E G E k G (p, for k > 1. 

(p is said to be i? fc -knowledge in G if "everyone in G knows that everyone in G 
knows that . . . that everyone in G knows that ip is true" holds, where the phrase 
"everyone in G knows that" appears in the sentence k times. 

C G <p (read "<p is common knowledge in G n ): The formula <p is said to be common 
knowledge in G if <p is .E^-knowledge for all k > 1. In other words, 

C G <p = E G <p A E G p A • • • A E™<p A • • • 



(We omit the subscript G when the group G is understood from context.) 
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Clearly, the notions of group knowledge introduced above form a hierarchy, with 



CV D • • • D E k+1 p D • • • D Ep D Sp D Dp D 



p. 



However, depending on the circumstances, these notions might not be distinct. For 
example, consider a model of parallel computation in which a collection of n processors 
share a common memory. If their knowledge is based on the contents of the common 
memory, then we arrive at a situation in which Cp = E k p = Ep = S(p = Dp. By 
way of contrast, in a distributed system in which n processors are connected via some 
communication network and each one of them has its own memory, the above hierarchy 
is strict. Moreover, in such a system, every two levels in the hierarchy can be separated 
by an actual task, in the sense that there will be an action for which one level in the 
hierarchy will suffice, but no lower level will. It is quite clear that this is the case with 
Ep D Sip D Dp, and, as we are about to show, the "muddy children" puzzle is an 
example of a situation in which E k p suffices to perform a required action, but E k ~ l p 
does not. In the next section we present the coordinated attack problem, a problem for 
which Cp suffices to perform a required action, but for no k does E k p suffice. 

Returning to the muddy children puzzle, let us consider the state of the children's 
knowledge of m: "At least one forehead is muddy". Before the father speaks, E k ~ 1 m 
holds, and E k m doesn't. To see this, consider the case k = 2 and suppose that Alice and 
Bob are the only muddy children. Clearly everyone sees at least one muddy child, so Em 
holds. But the only muddy child that Alice sees is Bob, and, not knowing whether she 
is muddy, Alice considers it possible that Bob is the only muddy child. Alice therefore 
considers it possible that Bob sees no muddy child. Thus, although both Alice and Bob 
know m (i.e., Em holds), Alice does not know that Bob knows m, and hence E 2 m does 
not hold. A similar argument works for the general case. We leave it to the reader to 
check that when there are k muddy children, E k m suffices to ensure that the muddy 
children will be able to prove their dirtiness, whereas E k ~ 1 m does not. (For a more 
detailed analysis of this argument, and for a general treatment of variants of the muddy 
children puzzle, see ||MDH86| .) 



Thus, the role of the father's statement was to improve the children's state of knowl- 
edge of m from E k ~ 1 m to E k m. In fact, the children have common knowledge of m after 
the father announces that m holds. Roughly speaking, the father's public announcement 
of m to the children as a group results in all the children knowing m and knowing that 
the father has publicly announced m. Assuming that it is common knowledge that all 
of the children know anything the father announces publicly, it is easy to conclude that 
the father's announcement makes m common knowledge. Once the father announces m, 
all of the children know both m and that the father has announced m. Every child thus 
knows that all of the children know m and know that the father publicly announced m, 
and so E 2 m holds. It is similarly possible to show that once the father announces m 
then E k m holds for all k, so Cm holds (see Section 10 for further discussion). Since, in 
particular, E k m holds, the muddy children can succeed in proving their dirtiness. 
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The vast majority of the communication in a distributed system can also be viewed 
as the act of improving the state of knowledge (in the sense of "climbing up a hierarchy" ) 
of certain facts. This is an elaboration of the view of communication in a network as 
the act of "sharing knowledge". Taking this view, two notions come to mind. One 
is fact discovery - the act of changing the state of knowledge of a fact <p from being 
distributed knowledge to levels of explicit knowledge (usually S-, E-, or C-knowledge), 
and the other is fact publication - the act of changing the state of knowledge of a fact 
that is not common knowledge to common knowledge. An example of fact discovery is 
the detection of global properties of a system, such as deadlock. The system initially 
has distributed knowledge of the deadlock, and the detection algorithm improves this 



state to S- knowledge (see [CL85| for work related to fact discovery). An example of 



fact publication is the introduction of a new communication convention in a computer 
network. Here the initiator(s) of the convention wish to make the new convention common 
knowledge. 

In the rest of the paper we devote a considerable amount of attention to fact pub- 
lication and common knowledge. As we shall show, common knowledge is inherent in 
a variety of notions of agreement, conventions, and coordinated action. Furthermore, 
having common knowledge of a large number of facts allows for more efficient commu- 
nication. Since these are goals frequently sought in distributed computing, the problem 
of fact publication — how to attain common knowledge — becomes crucial. Common 
knowledge is also a basic notion in everyday communication between people. For ex- 
ample, shaking hands to seal an agreement signifies that the handshakers have common 
knowledge of the agreement. Also, it can be argued ||CM81|| that when we use a definite 
reference such as "the president" in a sentence, we assume common knowledge of who is 
being referred to. 

In | UM81 |, Clark and Marshall present two basic ways in which a group can come 



to have common knowledge of a fact. One is by membership in a community, e.g., 
the meaning of a red traffic light is common knowledge in the community of licensed 
drivers. The other is by being copresent with the occurrence of the fact, e.g., the father's 
gathering the children and publicly announcing the existence of muddy foreheads made 
that fact common knowledge. Notice that if, instead, the father had taken each child 
aside (without the other children noticing) and told her or him about it privately, this 
information would have been of no help at all. 

In the context of distributed systems, community membership corresponds to infor- 
mation that the processors are guaranteed to have by virtue of their presence in the 
system (e.g., information that is "inserted into" the processors before they enter the sys- 
tem). However, it is not obvious how to simulate copresence or "public" announcements 
using message passing in a distributed system. As we shall see, there are serious problems 
and unexpected subtleties involved in attempting to do so. 
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4 The coordinated attack problem 

To get a flavor of the issues involved in attaining common knowledge by simulating 
copresence in a distributed system, consider the coordinated attack problem, originally 
introduced by Gray ||Gra78|| : 



Two divisions of an army are camped on two hilltops overlooking a common 
valley. In the valley awaits the enemy. It is clear that if both divisions 
attack the enemy simultaneously they will win the battle, whereas if only one 
division attacks it will be defeated. The divisions do not initially have plans 
for launching an attack on the enemy, and the commanding general of the 
first division wishes to coordinate a simultaneous attack (at some time the 
next day). Neither general will decide to attack unless he is sure that the 
other will attack with him. The generals can only communicate by means 
of a messenger. Normally, it takes the messenger one hour to get from one 
encampment to the other. However, it is possible that he will get lost in the 
dark or, worse yet, be captured by the enemy. Fortunately, on this particular 
night, everything goes smoothly. How long will it take them to coordinate an 
attack? 



We now show that despite the fact that everything goes smoothly, no agreement can 
be reached and no general can decide to attack. (This is, in a way, a folk theorem of 
operating systems theory; cf. [|Gal79| , |Gra78| , |YC79|| .) Suppose General A sends a message 
to General B saying "Let's attack at dawn", and the messenger delivers it an hour later. 
General A does not immediately know whether the messenger succeeded in delivering the 
message. And because B would not attack at dawn if the messenger is captured and fails 
to deliver the message, A will not attack unless he knows that the message was successfully 
delivered. Consequently, B sends the messenger back to A with an acknowledgement. 
Suppose the messenger delivers the acknowledgement to A an hour later. Since B knows 
that A will not attack without knowing that B received the original message, he knows 
that A will not attack unless the acknowledgement is successfully delivered. Thus, B will 
not attack unless he knows that the acknowledgement has been successfully delivered. 
However, for B to know that the acknowledgement has been successfully delivered, A 
must send the messenger back with an acknowledgement to the acknowledgement .... 
Similar arguments can be used to show that no fixed finite number of acknowledgements, 
acknowledgements to acknowledgements, etc. suffices for the generals to attack. Note 
that in the discussion above the generals are essentially running a handshake protocol 
(cf. ||Gra78|| ). The above discussion shows that for no k does a /c-round handshake protocol 



guarantee that the generals be able to coordinate an attack. 

In fact, we can use this intuition to actually prove that the generals can never attack 
and be guaranteed that they are attacking simultaneously. We argue by induction on d 
- the number of messages delivered by the time of the attack — that d messages do not 
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suffice. Clearly, if no message is delivered, then B will not know of the intended attack, 
and a simultaneous attack is impossible. For the inductive step, assume that k messages 
do not suffice. If k + 1 messages suffice, then the sender of the (k + l) st message attacks 
without knowing whether his last message arrived. Since whenever one general attacks 
they both do, the intended receiver of the (k + l) st message must attack regardless of 
whether the (k + l) st message is delivered. Thus, the (k + l) st message is irrelevant, and 
k messages suffice, contradicting the inductive hypothesis. 

After presenting a detailed proof of the fact that no protocol the generals can use will 
satisfy their requirements and allow them to coordinate an attack, Yemini and Cohen in 
||YC79|| make the following remark: 

. . . Furthermore, proving protocols correct (or impossible) is a difficult and 
cumbersome art in the absence of proper formal tools to reason about proto- 
cols. Such backward-induction argument as the one used in the impossibility 
proof should require less space and become more convincing with a proper 
set of tools. 



Yemini and Cohen's proof does not explicitly use reasoning about knowledge, but it 
uses a many-scenarios argument to show that if the generals both attack in one scenario, 
then there is another scenario in which one general will attack and the other will not. The 
crucial point is that the actions that should be taken depend not only on the actual state 
of affairs (in this case, the messenger successfully delivering the messages), but also (and 
in an acute way) on what other states of affairs the generals consider possible. Knowledge 
is just the dual of possibility, so reasoning about knowledge precisely captures the many- 
scenario argument in an intuitive way. We feel that understanding the role knowledge 
plays in problems such as coordinated attack is a first step towards simplifying the task 
of designing and proving the correctness of protocols. 

A protocol for the coordinated attack problem, if one did exist, would ensure that 
when the generals attack, they are guaranteed to be attacking simultaneously. Thus, in 
a sense, an attacking general (say A) would know that the other general (say B) is also 
attacking. Furthermore, A would know that B similarly knows that A is attacking. It is 
easy to extend this reasoning to show that when the generals attack they have common 
knowledge of the attack. However, each message that the messenger delivers can add at 
most one level of knowledge about the desired attack, and no more. For example, when 
the message is first delivered to B, B knows about A's desire to coordinate an attack, 
but A does not know whether the message was delivered, and therefore A does not know 
that B knows about the intended attack. And when the messenger returns to A with £>'s 
acknowledgement, A knows that B knows about the intended attack, but, not knowing 
whether the messenger delivered the acknowledgement, B does not know that A knows 
(that B knows of the intended attack). This in some sense explains why the generals 
cannot reach an agreement to attack using a finite number of messages. We are about 
to formalize this intuition. Indeed, we shall prove a more general result from which the 
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inability to achieve a guaranteed coordinated attack will follow as a corollary. Namely, 
we prove that communication cannot be used to attain common knowledge in a system 
in which communication is not guaranteed, and formally relate a guaranteed coordinated 
attack to attaining common knowledge. Before we do so, we need to define some of the 
terms that we use more precisely. 



5 A general model of a distributed system 

We now present a general model of a distributed environment. Formally, we model such 
an environment by a distributed system, where the agents are taken to be processors 
and interaction between agents is modeled by messages sent between the processors over 
communication links. For the sake of generality and applicability to problems involving 
synchronization in distributed systems, our treatment will allow processors to have hard- 
ware clocks. Readers not interested in such issues can safely ignore all reference to clocks 
made throughout the paper. 

We view a distributed system as a finite collection {pi,p 2 , ■ ■ ■ ,p n } of two or more 
processors that are connected by a communication network. We assume an external 
source of "real time" that in general is not directly observable by the processors. The 
processors are state machines that possibly have clocks, where a clock is a monotone 
nondecreasing function of real time. If a processor has a clock, then we assume that its 
clock reading is part of its state. (This is in contrast to the approach taken by Neiger and 
Toueg in | NT93|| ; the difference is purely a matter of taste.) The processors communicate 



with each other by sending messages along the links in the network. 

A run r of a distributed system is a description of an execution of the system, from 
time until the end of the execution. (We assume for simplicity that the system executes 
forever. If it terminates after finite time, we can just assume that it remains in the same 
state from then on.) A point is a pair (r, t) consisting of a run r and a time t > 0. We 
characterize the run r by associating with each point (r, t) every processor p^s local history 
at (r, t), denoted h(pi, r, t). Roughly speaking, h(pi, r, t) consists of the sequence of events 
that pi has observed up to time t in run r. We now formalize this notion. We assume that 
processor pi "wakes up" or joins the system in run r at some time ti n n(pi,r) > 0. The 
processor's local state when it wakes up is called its initial state. The initial configuration 
of a run consists of the initial state and the wake up time for each processor. In systems 
with clocks, the clock time function r describes processors' clock readings; r(pi,r,t) is 
the reading of p^'s clock at the point (r, t). Thus, r(j»j, r, t) is undefined for t < Unuipi, r) 
and is a monotonic nondecreasing function of t for t > ti n it(pi,r). We say that r and r' 
have the same clock readings if r(pi,r,t) = r(pi,r',t) for all processors pi and all times 
t. (If there are no clocks in the system, we say for simplicity that all runs have the same 
clock readings.) We take h(pi,r,t) to be empty if t < t init (pi,r). For t > t init (pi,r), the 
history h(pi, r, t) consists of p^'s initial state and the sequence of messages Pi has sent and 
received up to, but not including, those sent or received at time t (in the order they were 
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sent /received) . We assume that this sequence of messages is finite. If Pi has a clock, the 
messages are also marked with the time at which they were sent or received (i.e., with 
r(pi,r,t), if they were sent or received at time t), and the history includes the range of 
values that the clock has read up to and including time t. If we consider randomized 
protocols, then h(pi,r,t) also includes p^s random coin tosses. For ease of exposition, 
we restrict attention to deterministic protocols in this paper. In a deterministic system 
with no external inputs and no failures, a processor's internal state will be a function of 
its history. Thus, the sequence of internal states that a processor goes through can be 
recovered from its history. 

Corresponding to every distributed system, given an appropriate set of assumptions 
about the properties of the system and its possible interaction with its environment, there 
is a natural set R of all possible runs of the system. We identify a distributed system 
with such a set R of its possible runs. For ease of exposition, we sometimes slightly abuse 
the language and talk about a point (r, t) as being a point of R when r e R. A run r' is 
said to extend a point (r,t) if h(pi,r,t') = h(pi,r',t') for all t' < t and all processors pi. 
Observe that r' extends (r, t) iff r extends (r',t). 

Identifying a system with a set of runs is an important idea that will play a crucial 
role in allowing us to make precise the meaning of knowledge in a distributed system. 
The relative behavior of clocks, the properties of communication in the system, and 
many other properties of the system, are directly reflected in the properties of this set 
of runs. Thus, for example, a system is synchronous exactly if in all possible runs of the 
system the processors and the communication medium work in synchronous phases. A 
truly asynchronous system is one in which the set of runs allows any message sent to be 
delayed an unbounded amount of time before being delivered. (We discuss asynchrony in 
greater detail in Section 8.) Clocks are guaranteed to be synchronized to within a bound 
of 5 if they differ by no more than S time units at all points in all runs of the system. If 
we view the set of runs as a probability space with some appropriate measure, then we 
can also capture probabilistic properties of the environment and formalize probabilistic 
protocols in this framework. 

We shall often be interested in the set of runs generated by running a particular 
protocol, under some assumptions on the communication medium. Intuitively, a protocol 
is a function specifying what actions a processor takes (which in our case amounts to 
what messages it sends) at any given point (after the processor wakes up) as a function 
of the processor's local state. Since a processor's local state is determined by its history, 
we simply define a protocol to be a deterministic function specifying what messages the 
processor should send at any given instant, as a function of the processor's history. Recall 
that h(pi,r,t), processor p^s history at the point (r, i), does not include messages sent 
or received at time t, so a processor's actions at time t according to a protocol depend 
only on messages received in the past. As we mentioned above, for ease of exposition we 
restrict attention to deterministic protocols in this paper. The definitions and results can 
be extended to nondeterministic and probabilistic protocols in a straightforward way. A 
joint protocol for G is a tuple consisting of a protocol for every processor in G. 
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6 Ascribing knowledge to processors 



What does it mean to say that a processor knows a fact (pi In our opinion, there is no 
unique "correct" answer to this question. Different interpretations of knowledge in a dis- 
tributed system are appropriate for different applications. For example, an interpretation 
by which a processor is said to know ip only if ip appears explicitly in a designated part 
of the processor's storage (its "database") seems interesting for certain applications. In 
other contexts we may be interested in saying that a processor knows ip if the proces- 
sor could deduce <p from the information available to it. In this section we give precise 
definitions of interpretations of knowledge in a distributed system. 

We assume the existence of an underlying logical language of formulas for representing 
ground facts about the system. A ground fact is a fact about the state of the system that 
does not explicitly involve processors' knowledge. For example, "the value of register x 
is 0", or "processor pi sent the message m to processor pj", are ground facts. 

We extend the original language of ground formulas to a language that is closed under 
operators for knowledge, distributed knowledge, everyone knows, and common knowledge 
(so that for every formula ip, processor p i} and subset G of the processors, Kiip, D a <p, 
E G ip, and C G <p are formulas), and under Boolean connectives. (In Section 11 we consider 
additional operators.) 

We now describe one of the most natural ways of ascribing knowledge to processors 
in a distributed system, which we call view-based knowledge interpretations. At every 
point each processor is assigned a view; we say that two points are indistinguishable to 
the processor if it has the same view in both. A processor is then said to know a fact 
at a given point exactly if the fact holds at all of the points that the processor cannot 
distinguish from the given one. Roughly speaking, a processor knows all of the facts that 
(information theoretically) follow from its view at the current point .0 

More formally, a view function v for a system R assigns to every processor at any 
given point of R a view from some set £ of views (the structure of £ is not relevant at 
this point); i.e., v(pi,r,t) G £ for each processor pi and point (r,t) of R. Given that a 
processor's history captures all of the events in the system that a processor may possibly 
observe, we require the processor's view at any given point to be a function of its history 
at that point. In other words, whenever h(pi,r,t) = h(pi,r' ,t'), it must also be the case 
that v(pi, r, t) = v(pi, r', t'). 



2 In a previous version of this paper HM9C |, view-based knowledge interpretations were called state- 
based interpretations. Particular view-based knowledge interpretations were first sugge sted to us in- 
dependently by Cynthia Dwork and by Stan Rosenschein. Since the appearance of | HM90 |, most 
authors w ho considered know l edge in distr i buted s ystem s have focuss ed on view-based interpreta- 



tions; c f. pMM , |DM90 



FI86, HF85, 



LR86 



MT88 



PR85| , |RK86| and jHal87|| for an overview. (See 
[ FH88 , Mos88| for examples of interpretations of knowledge that are not view based.) The approach 
taken to defining knowledge in view-based systems is closely related to the possible-worlds approach 
taken by Hintikka [ Hin62| . For us the "possible worlds" are the points in the system; the "agents" are 
the processors. A processor in one world (i.e., point) considers another world possible if it has the same 
view in both. 



12 



A view-based knowledge interpretation X is a triple (R,ir,v), consisting of a set of 
runs R, an assignment it which associates with every point in R a truth assignment to 
the ground facts (so that for every point (r, t) in R and every ground fact P, we have 
7r(r,t)(P) G {true, false}), and a view function v for R. A triple (X, r, t), where X is a 
knowledge interpretation and (r, t) is a point of R, is called a knowledge point. Formulas 
are said to be true or false of knowledge points. Let X = (R,ir,v). We can now define 
the truth of a formula tp at a knowledge point (X, r, t), denoted (X, r, t) |= </? (and also 
occasionally read "p holds at (X, r, t) or just "p holds at (r, t) ", if the interpretation X 
is clear from context), by induction on the structure of formulas: 

(a) If P is a ground formula then (X, r,t) \= P iff 7r(r, t)(-P) = true. 

(b) (X,r,t) |=^iff (X,r,t) 

(c) (X, r, t) |= A fa iff (X, r, t) |= fa and (X, r, t) \= fa. 

(d) (X, r, i) |= fT^ iff (X, r', £') |= ^ for all (r', £') in R satisfying v(p i: r, t) = v(p i: r', t'). 

Part (a) says the truth value of ground facts is defined by it. Parts (b) and (c) state 
that negations and conjunctions have their classical meaning. Part (d) captures the fact 
a processor p^s knowledge at a point (r, t) is completely determined by its view v(pi, r, t). 
The processor does not know tp in a given view exactly if there is a point (in R) at which 
the processor has that same view, and tp does not hold. The definitions of when E G tp 
and C G tp hold at a knowledge point follow directly from the definition of E G and C G in 
Section 3: 

(e) (X, r, t) |= E G ip iff (X, r, t) |= for all Pi G G. 

(f) (X, r, i) |= C G t\) iff (X, r, t) |= E G ip for all fc > 0. 

Let us consider when a group G of processors has distributed knowledge of a fact. 
Intuitively, a group's distributed knowledge is the combined knowledge of all of its mem- 
bers. For example, we could imagine considering the group as being able to distinguish 
two points if one (or more) of its members can distinguish them. The set of points in- 
distinguishable by G from the current one is then the intersection of the sets of points 
indistinguishable by the individual members of the group. We can therefore define when 
a group G has distributed knowledge of a fact tp as follows: 

(g) (X, r, t) |= D G ip iff (X, r', t') \= ip for all (r', t') in R satisfying v(pi, r, t) = v(pi, r', t') 
for all Pi G G. 

Notice that indeed under this definition, if one member of G knows tp while another 
member knows that p D ip, then the members of G have distributed knowledge of 
ip. The definition of distributed knowledge given above is in a precise sense a direct 
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generalization of the definition of individual processors' knowledge in clause (d) above. 
We can define the joint view assigned by v to G to be 



v(G,r,t) = {(Pi,v(pi,r,t)} : pi e G}. 

It is easy to check that (X, r, t) \= D G ip iff (X, r', t') \= ip for all (r', t') in R satisfying 
v(G, r, t) = v(G, r', t'). Thus, we can identify the distributed knowledge of a group G with 
the knowledge of an agent whose view is the group's joint viewfj Note that the knowledge 
distributed in a group of size one coincides with its unique member's knowledge. 

View-based interpretations will prove to be a useful way of ascribing knowledge to 
processors for the purpose of the design and analysis of distributed protocols. We now 
discuss some of the basic properties of knowledge in view-based interpretations. Fix a 
system R and a view function v. We can construct a graph corresponding to R and v 
by taking the nodes of the graph to be all the points of R, and joining two nodes (r, t) 
and (r', t') by an edge labelled Pi if v (pi, r, t) = v(pi, r', £'); i.e., if Pi has the same view at 
both points. Our definition of knowledge under a view-based interpretation immediately 
implies that K^ip holds at a given point (r, t) if and only if ip holds at all points (r', t') that 
share an edge labeled Pi with (r, t). Define a point (r', t') in this graph to be G-reachable 
from (r,t) in k steps (with respect to the view function v) if there exist points (r ,t ), 
(7*1, ti), . . . , (rfc, tk) such that (r, t) = (r , t ), (r', t') = (r^, t^), and for every i < k there is 
a processor pj i G G such that (r^tj) and (r i+1 ,t i+1 ) are joined by an edge labeled pj t . It 
follows that E G <p holds at (r, t) under this view-based interpretation exactly if <p holds at 
all points G-reachable from (r, t) in 1 step. An easy induction on k shows that E^cp holds 
exactly if <p holds at all points G-reachable in k steps. Consequently, it is easy to see that 
C G ip holds at a point (r, t) if and only if (p holds at all points that are G-reachable from 
(r, t) in a finite number of steps. In the particular case that G is the set of all processors, 
then C G (p holds at (r, t) exactly if (p holds at all points in the same connected component 
of the graph as (r, t) . 

The way distributed knowledge is represented in this graph is also instructive: D G (p 
holds at a given point (r, t) iff <p holds at all points (r', t') such that for each p { G G, 
there is an edge between (r,t) and (r',t') labelled by pi. Thus, for distributed knowledge 
the set of points we need to consider is the intersection of the sets of points we consider 
when determining what facts each individual processor knows. 

By describing the various notions of knowledge in the view-based case via this graph, it 
becomes easier to investigate their properties. In fact, this graph is very closely related to 
Kripke structures, a well known standard way of modeling modal logics. In fact, drawing 
on the theory of modal logics, we can immediately see that the definition of knowledge 
in view-based interpretations agrees with the well-known modal logic S5 (cf. ||HM92 |). A 



modal operator M is said to have the properties of S5 if it satisfies the following axioms 
and rule of inference: 



3 



The knowledge ascribed to a set of processes by Chandy and Misrain [ CM86 1 essentially corresponds 



to the distributed knowledge of that set, as defined here. See also [PR85, RK86| 
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Al. The knowledge axiom: Mtp D tp, 

A2. The consequence closure axiom: Mp> A M(p D tp) D Mtp, 

A3. The positive introspection axiom: Mp> D MMtp, 

A4. The negative introspection axiom: ->M(p D M—^Mip, and 

Rl. The rn/e of necessitation: From (p infer M</?. 

Given a knowledge interpretation X for a system i£, a fact tp is sa id to be valid in the 
system if it holds at all knowledge points (X, r, i) for points (r, i) of -R. In our context 
the rule Rl means that whenever p> is valid in the system, so is Mip. 

We can now show: 



Proposition 1: Under view-based knowledge interpretations, the operators i<Q, D G , 
and C G all have the properties of S5. 



The proof is a consequence of the fact that the definitions of these notions are based 
on equivalence relations (over points): The relation of processor p^s having the same view 
at two points, the relation of all processors in G having the same joint views at both 
points, and the relation of being reachable via a path consisting solely of edges labeled 
by members of G in the graph corresponding to the view, are all equivalence relations. 
The proof of this proposition can be found in ||HM92 | . 

In addition to having the properties of S5, common knowledge has two additional 
useful properties under view-based interpretations: 



CI. The fixed point axiom: C G ip = E G (p A C G p), and 

C2. The induction rule: From tp D E G (ip A tp) infer <p D C G tp. 



The fixed point axiom essentially characterizes C G p> as the solution of a fixed point 
equation (in fact, it is the greatest solution; we discuss this in more detail in Section 11 
and Appendix A). This property of common knowledge is crucial in many of our proofs. 

Intuitively, the induction rule says that if <p is "public" and implies ip, so that when- 
ever p holds then everybody knows pAip, then whenever p> holds, tp is common knowledge. 
We call it the "induction rule" because it is closely related to the notion of induction in 
arithmetic: Using the fact that p> D E G (p> A tp) is valid in the system, we can prove by 
induction on k that p> D E G (<p A tp) is also valid in the system, for all k > 0. It then 
follows that p> D C G tp is valid in the system. Roughly speaking, this proof traces our 
line of reasoning when we argued that the children in the muddy children puzzle attain 
common knowledge of the father's statement. We can get an important special case of 
the Induction Rule by taking tp to be p>. Since E G (<p A p) is equivalent to E G ip, we get 
that from tp D E G <p we can infer ip D C G p. 
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A very important instance of view-based knowledge interpretations, that will be used 
extensively from Section 11 on, is called the complete- history interpretation. Under this 
interpretation we have v(pi, r, t) == h(pi, r, t). That is, the processor's complete history is 
taken to be the view on which the processor's knowledge is based. (In a previous version 



of this paper [[HM90|1 , this was called the total view interpretation.) The complete- 
history interpretation makes the finest possible distinctions among histories. Thus, in 
a precise sense, it provides the processors with at least as much knowledge about the 
ground formulas as any other view-based interpretation. This is one of the reasons why 
the complete-history interpretation is particularly well suited for proving possibility and 
impossibility of achieving certain goals in distributed systems, and for the design and 
analysis of distributed protocols (cf. ||CM86| , |DM90| , |MT88|| ). 

Notice that view-based knowledge interpretations ascribe knowledge to a processor 
without the processor necessarily being "aware" of this knowledge, and without the pro- 
cessor needing to perform any particular computation in order to obtain such knowledge. 
Interestingly, even if the view function v does not distinguish between possibilities at 
all, that is, if there is a single view A such that v(pi,r,t) = A for all p i: r, and t, the 
processors are still ascribed quite a bit of knowledge: every fact that is true at all points 
of the system is common knowledge among all the processors under this view-based inter- 
pretation (and in fact under all view-based interpretations). Note that the hierarchy of 
Section 3 collapses under this interpretation, with Dip = Etp = Ctp. This interpretation 
makes the coarsest possible distinctions among histories; at the other extreme we have 
the complete-history interpretation, which makes the finest possible distinctions among 
histories. 

Another reasonable view-based interpretation is one in which v(pi,r,t) is defined to 
be p^s local state at (r, t). (Recall that processors are state machines, and are thus 



assumed to be in some local state at every point). This is the choice made in |[F186 



Ros85| , [RK86|| . Under this interpretation, a processor might "forget" facts that it knows. 
In particular, if a processor can arrive at a given state by two different message histories, 
then, once in that state, the processor's knowledge cannot distinguish between these 
two "possible pasts" . In the complete-history interpretation, a processor's view encodes 
all of the processor's previous states, and therefore processors do not forget what they 
know; if a processor knows ip at a knowledge point (X, r, t) , then at all knowledge points 
(X, r, t') with t' > t the processor will know that it once knew (p. Thus, while there may 
be temporary facts such as "it is 3 on my clock" which a processor will not know at 4 
o'clock, it will know at 4 o'clock that it previously knew that it was 3 o'clock. 

Other view-based interpretations that may be of interest are ones in which a proces- 
sor's view is identified with the contents of its memory, or with the position of its program 
counter (see ||KT86|| for a closer look at some of these view-based interpretations). The 



precise view-based interpretation we choose will vary from application to application. 
For proving lower bounds we frequently use the complete-history interpretation since, 
in general, if processors cannot perform an action with the knowledge they have in the 
complete-history interpretation, they cannot perform it at all. On the other hand, if 
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we can show that very little information is required to perform a given action, this may 
suggest an efficient protocol for performing it. 

Although view-based knowledge interpretations are natural and useful in many appli- 
cations, they do not cover all reasonable possibilities of ascribing knowledge to processors 
in a distributed system. For example, as we have commented above, view-based knowl- 
edge interpretations ascribe knowledge to processors in a fashion that is independent of 
the processor's computational power. To the extent that we intend processors' knowl- 
edge to closely correspond to the actions they can perform, it often becomes crucial 
to define knowledge in a way that depends on the processors' computational powers 
(cf. ||MT88| , |Mos88|| ). In most of the paper we deal exclusively with view-based knowl- 
edge interpretations. However, in order to be able to prove stronger negative results 
about the attainability of certain states of knowledge, we now give a general definition 
of knowledge interpretations, which we believe covers all reasonable cases. 

Intuitively, we want to allow any interpretation that satisfies the two properties dis- 
cussed in Section 3: (1) that a processor's knowledge be a function of its history and (2) 
that only true things be known (so that the axiom D p is valid). We capture the first 
property through the notion of an epistemic interpretation. An epistemic interpretation 
X is a function assigning to every processor p^ at any given point (r, t), a set /Cf(r, t) of 
facts in the extended language that Pi is said to "believe". /Cf(r, t) is required to be a 
function of p^s history at (r, t). Thus, if h(pi, r, t) = h(pi, r', t'), then /Cf (r, t) = JCj (r', t'). 

Given an epistemic interpretation X, we now specify when a formula p> of the extended 
language holds at a point (r, t) (denoted (X, r, t) \= <p). As before, if p is a ground 
fact, we say that (X, r, t) \= tp iff ir(r,t)(p) = true, while if <p is a conjunction or a 
negation, then its truth is defined based on the truth of its subformulas in the obvious 
way. If <p is of the form K^ip, then (X, r, t) |= K^ip iff ip G JCf(r,t). In this case we 
say that pi believes ip. The formula E G ip is identified with the conjunction A Kiip, 

so that (X, r, t) \= E G ip iff (X, r, t) |= K(ip for all p { G G. If p is of the form C G ip, 
then (X, r, t) \= C G ip iff (X, r, t) |= E G (ip A C G ip). Thus, common knowledge is defined 
so that the fixed point axiom holds, rather than as an infinite conjunction. Although 
this definition seems circular, it is not. In order to determine if (X, r, t) \= C G ip, we first 
check if (X, r, t) \= Kiiip A C G ip) for all pi G G. The latter fact can be determined by 
considering the sets /Cf(r, t). Finally, to handle distributed knowledge, we need to add 
a set /C^(r, t) of formulas to every point (r, t) for each set of processors G, analogous 
to the sets /Cf(r, t) for individual processors. We define (X, r, t) |= D G ip if <p G /C^.(r, t). 
The sets /C^(r, t) must be a function of G"s joint history at (r, t). We may want to put 
some restrictions on the sets /C^(r, t). For example, we may require that if i G G and 
ip G /Cf(r, t) then p G /C^.(r, t) (which implies that K^p D D G p is valid). Since we do 
not consider distributed knowledge in interpretations that are not view based, we do not 
pursue the matter any further here. 

The knowledge axiom K^p D p is not necessarily valid in epistemic interpretations. 
Indeed, that is why we have interpreted K^p as "processor i believes p>" in epistemic 
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interpretations, since the knowledge axiom is the key property that is taken to distinguish 
knowledge from belief. A processor's beliefs may be false, although a processor cannot 
be said to know ip if <p is in fact false. Given an epistemic interpretation X and a set of 
runs R, we say that X is a knowledge interpretation for R if for all processors pi, times t, 
runs r G R and formulas tp in the extended language, it is the case that whenever 
(X, r, t) \= Ki^p holds, (X, r, t) |= <p also holds. Thus, an epistemic interpretation for R 
is a knowledge interpretation for R exactly if it makes the knowledge axiom valid in 
R. Notice that the view-based knowledge interpretations defined above are in particular 
knowledge interpretations. 

A trivial consequence of our definitions above is: 

Lemma 2: Let X be a knowledge interpretation for R and let (r, t) be a point of R. 
The following are equivalent for a nonempty subset G of processors: 

1. (X,r,t) \=C a <p 

2. (X, r, t) |= Ki(<p A C G p) for all processors pi G G 

3. (X, r, t) |= i£i(</? A C G y?) f° r some processor p« G G. 

This lemma shows that common knowledge requires simultaneity in a very strong 
sense: When a new fact becomes common knowledge in a group G, the local histories 
of all of the members of G must change simultaneously to reflect the event of the fact's 
becoming common knowledge. This point is perhaps best understood if we think of time 
as ranging over the natural numbers. Given a knowledge interpretation X, suppose that 
common knowledge does not hold at the point (r, t) but does hold at the point (r, t + 1), 
so that (X, r, t) |= ^C G p> and (X, r, t + 1) |= C G y?. Then it must be the case that the local 
histories of all processors in G changed between times t and t + 1. To see this, note that 
by Lemma | we have (X, r, t + 1) [= iQ(<£> A C G p) for all G G. Suppose pi & G has 
the same local history in (r, t) and (r, t + 1). Then by our assumption that a processor's 
knowledge depends only on its local history, we have that (X, r, t) \= Ki(p> A C G p>). Now 
by Lemma ||] again, we have (X, r, t) |= C G v?, contradicting our original assumption. 

We close this section with another trivial observation that follows easily from Lemma 0. 

Lemma 3: Let X be a knowledge interpretation for R, let r and r' be runs in R, and let 
Pi be a processor in G. If /iQoj, r, t) = h(pi, r', t') then (X, r, t) |= C G p> iff (X, r', t') |= C G p>. 

Proof: Given that pi G G, we have by Lemma ^| that (X, r, t) |= C G y2 iff (X, r, i) |= 
Ki(p AC G p>). Since h(pi,r,t) = h(pi,r',t'), this holds iff (X, r',t') |= Ki(<p AC G <p). Again 
by Lemma ||] this is true iff (X, r', t') |= C G y?, and we are done. | 
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7 Coordinated attack revisited 



Now that we have the basic terminology with which to define distributed systems and 
knowledge in distributed systems, we can relate the ability to perform a coordinated 
attack to the attainment of common knowledge of particular facts. This in turn will 
motivate an investigation of the attainability of common knowledge in systems of various 
types. 

We formalize the coordinated attack problem as follows: We consider the generals as 
processors and their messengers as communication links between them. The generals are 
assumed to each behave according to some predetermined deterministic protocol; i.e., a 
general's actions (what messages it sends and whether it attacks) at a given point are a 
deterministic function of his history and the time on his clock. In particular, we assume 
that the generals are following a joint protocol (Pa, Pb), where A follows Pa and B 
follows Pb- We can thus identify the generals with a distributed system R, consisting 
of all possible runs of (Pa,Pb)- According to the description of the coordinated attack 
problem in Section 4, the divisions do not initially have plans to attack. Formally, this 
means that the joint protocol the generals are following has the property that in the 
absence of any successful communication neither general will attack. Thus, in any run 
of R where no messages are delivered, the generals do not attack. 

We can now show that attacking requires attaining common knowledge of the attack: 

Proposition 4: Any correct protocol for the coordinated attack problem has the prop- 
erty that whenever the generals attack, it is common knowledge that they are attacking. 

Proof: Let (Pa, Pb) be a correct (joint) protocol for the coordinated attack problem, 
with R being the corresponding system. Consider a ground language consisting of a 
single fact ip = f "both generals are attacking", let w(r, t) assign a truth value to this 
formula in the obvious way at each point (r, t), and let X be the corresponding complete- 
history interpretation. Assume that the generals attack at the point (f, t) of R. We show 
that (X,r,t) |= Cip. Our first step is to show that ip D Eip is valid in the system R. 
Assume that (r, t) is an arbitrary point of R. If (X, r, t) \= —up, then we trivially have 
(X,r,t) \= ip D Eip. If (I,r,t) \= ip, then both generals attack at (r,t). Suppose that 
(r 1 , t') is a point of R in which A has the same local history as in (r, t). Since A is executing 
a deterministic protocol and A attacks in (r, t), A must also attack in (r', t'). Furthermore, 
given that the protocol is a correct protocol for coordinated attack, if A attacks in (r', t'), 
then so does B, and hence (X,r',t') \= ip. It follows that (X,r,t) \= K A ip; similarly we 
obtain (X,r,t) \= K B ip. Thus (X,r,t) \= Eip, and again we have (X,r,t) (= tp D Eip. We 
have now shown that ip D Eip is valid in R. By the induction rule it follows that ip D Cip 
is also valid in R. Since (X, f, t) \= ip, we have that (X, f, t) \= Cip and we are done. I 

Proposition |] shows that common knowledge is a prerequisite for coordinated attack. 
Unfortunately, common knowledge is not always attainable, as we show in the next 
section. Indeed, it is the unattainability of common knowledge that is the fundamental 
reason why the generals cannot coordinate an attack. 
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8 Attaining common knowledge 



Following the coordinated attack example, we first consider systems in which commu- 
nication is not guaranteed. Intuitively, communication is not guaranteed in a system 
if messages might fail to be delivered in an arbitrary fashion, independent of any other 
event in the system. Completely formalizing this intuition seems to be rather cumber- 



condition, which must be satisfied by any reasonable definition of the notion of commu- 
nication not being guaranteed, will suffice. Roughly speaking, we take communication 
not being guaranteed to correspond to two conditions. The first says that it is always 
possible that from some point on no messages will be received. The second says that if 
processor pi does not get any information to the contrary (by receiving some message), 
then pi considers it possible that none of its messages were received. 

Formally, given a system R, we say that communication in R is not guaranteed if the 
following two conditions hold: 

NG1. For all runs r and times t, there exists a run r' extending (r,t) such that r and r' 
have the same initial configuration and the same clock readings, and no messages 
are received in r' at or after time t. 

NG2. If in run r processor pi does not receive any messages in the interval (t', t), then there 
is a run r' extending (r, t') such that r and r' have the same initial configuration and 
the same clock readings, h(pi,r,t") = h(pi,r',t") for all t" < t, and no processor 
Pj 7^ pi receives a message in r' in the interval [t', t). 

Note that the requirement that r and r' have the same initial configuration already follows 
from the fact that r' extends (r, t) if all the processors have woken up by time t in run r. 
In particular, if we restricted attention to systems where all processors were up at time 
0, we would not require this condition. 

We can now show that in a system in which communication is not guaranteed, common 
knowledge is not attainable. 

Theorem 5: Let R be a system in which communication is not guaranteed, let I be 
a knowledge interpretation for R, and let |G| > 2. Let r be a run of R, and let r~ be a 
run of R with the same initial configuration and the same clock readings as r, such that 
no messages are received in r~ up to time t. Then for all formulas tp it is the case that 



Proof: Fix (p. Without loss of generality, we can assume p ± ,p 2 € G. Let d(r) be the 
number of messages received in r up to (but not including) time t. We show by induction 
on k that if d(r) = k, then (T,r,t) \= C G f iff (X,r~,t) \= C G f- We assume that all 
the runs mentioned in the remainder of the proof have the same initial configuration 
and the same clock readings as r. First assume that d{r) = 0. Thus no messages are 



some 




attempt to do so here. For our purposes, a weak 



(I,M)h^ iff (I,r-,t)\=C a <p. 
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received in r up to time t. Since r and r~ have the same initial configuration and clock 
readings, it follows that h(p 1 ,r,t) = h{p ly r~ ,t). By Lemma || we have (T,r~,t) \= C G (p 
iff (X, r, t) |= C G ip, as desired. 

Assume inductively that the claim holds for all runs r' G R with d(r') = k, and 
assume that d(r) — k + 1. Let t' < t be the latest time at which a message is received in 
r before time t. Let pj be a processor that receives a message at time t' in r. Let Pi be 
a processor in G such that p« 7^ Pj (such a ^ exists since |G| > 2). From property NG2 
in the definition of communication not being guaranteed, it follows that there is a run 
r' E R extending (r, t') such that h(pi, r, t") = h(pi, r', t") for all t" < t and all processors 
Pk 7^ Pi receive no messages in r' in the interval [t',t). By construction, d(r f ) < k, so 
by the inductive hypothesis we have that (X, r~,t) \= C G <p iff (I,r',t) \= C G <p. Since 
h(pi,r,t) = h(pi,r',t), by Lemma |^ we have that (T,r',t) \= C G <p iff (I,r,t) \= C G <p. 
Thus (X, r - , t) \= C G ip iff (X, r, t) |= C G ip. This completes the proof of the inductive step. 
I 

Note that Theorem [5] does not say that no fact can become common knowledge in 
a system where communication is not guaranteed. In a system where communication is 
not guaranteed but there is a global clock to which all processors have access, then at 5 
o'clock it becomes common knowledge that it is 5 o'clock.^] However, the theorem does 
say that nothing can become common knowledge unless it is also common knowledge 
in the absence of communication. This is a basic property of systems with unreliable 
communication, and it allows us to prove the impossibility of coordinated attack. 

Corollary 6: Any correct protocol for the coordinated attack problem guarantees that 
neither party ever attacks (!). 

Proof: Recall that communication between the generals is not guaranteed (i.e., it 
satisfies conditions NG1 and NG2 above), and we assume that in the absence of any 
successful communication neither general will attack. Thus, if we take ip to be "both 
generals are attacking" , then Cip does not hold at any point in a run in which no messages 
are received (since ip does not hold at any point of that run). Theorem § implies that the 
generals will never attain common knowledge of ip in any run, and hence by Proposition ^ 
the generals will never attack. | 

It is often suggested that for any action for which dp suffices, there is a k such that 
E k (p suffices, as is the case in the muddy children puzzle. The coordinated attack problem 
shows that this is false. The generals can attain E k (p of many facts (f for an arbitrarily 
large k (for example, if the first k messages are delivered). However, simultaneous coor- 
dinated attack requires common knowledge (as is shown in Proposition |]); nothing less 
will do. 

4 We remark that the possible presence of some sort of global clock is essentially all that stops us 
from saying that no fact can become common knowledge if it was not already common knowledge at 
the beginning of a run. See Proposition ^ in Appendix B and the discussion before it for conditions 
under which it is the case that no fact can become common knowledge which was not initially common 
knowledge. 
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The requirement of simultaneous attack in the coordinated attack problem is a very 
strong one. It seems that real life generals do not need a protocol that guarantees such a 
strong condition, and can probably make do with one that guarantees a non-simultaneous 
attack. We may want to consider weakening this requirement in order to get something 
that is achievable. In Section 11 we use a variant of the argument used in Corollary |6] 
to show that no protocol can even guarantee that if one party attacks then the other 
will eventually attack! On the other hand, a protocol that guarantees that if one party 
attacks, then with high probability the other will attack is achievable, under appropriate 
probabilistic assumptions about message delivery. The details of such a protocol are 
straightforward and left to the reader. 

We can prove a result similar to Theorem || even if communication is guaranteed, 
as long as there is no bound on message delivery times. A system R is said to be a 
system with unbounded message delivery times if condition NG2 of communication not 
guaranteed holds, and in addition we have: 

NG1'. For all runs r and all times t, u, with t < u, there exists a run r' extending (r, t) 
such that r' has the same initial configuration and the same clock readings as r, 
and no messages are received in r' in the interval [t, u] . 

Asynchronous systems are often defined to be systems with unbounded message delivery 
times (for example, in [|FLP85|| ). Intuitively, condition NG1' says that it is always possible 



for no messages to be received for arbitrarily long periods of time, whereas condition NG1 
says that it is always possible for no messages at all to be received from some time on. 
In some sense, we can view NG1 as the limit case of NG1'. Notice that both systems 
where communication is not guaranteed and systems with unbounded message delivery 
times satisfy condition NG2. The proof of Theorem |5] made use only of NG2, not NG1, 
so we immediately get 



Theorem 7: Let R be a system with unbounded message delivery times, let 2 be a 
knowledge interpretation for R, and let \G\ > 2. Let r be a run of R, and let r~ be a 
run of R with the same initial configuration and the same clock readings as r, such that 
no messages are received in r~ up to time t. Then for all formulas (p it is the case that 
(l,r,t) \=C a tpiE(T,r-,t) |= C G <p. I 



The previous results show that, in a strong sense, common knowledge is not attainable 
in a system in which communication is not guaranteed or, for that matter, in a system 
in which communication is guaranteed, but there is no bound on the message delivery 
times. However, even when all messages are guaranteed to be delivered within a fixed 
time bound, common knowledge can be elusive. To see this, consider a system consisting 
of two processors, R2 and D2, connected by a communication link. Moreover, (it is 
common knowledge that) communication is guaranteed. But there is some uncertainty 
in message delivery times. For simplicity, let us assume that any message sent from R2 
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to D2 reaches D2 either immediately or after exactly e seconds; furthermore, assume that 
this fact is common knowledge. Now suppose that at time t$, R2 sends D2 a message m 
that does not contain a timestamp, i.e., does not mention t$ in any way. The message 
m is received by D2 at time to- Let sent(m) be the fact "tie message m has been sent" . 
D2 doesn't know sent(m) initially. How does {R2,D2}'s state of knowledge of sent(m) 
change with time? 

At time to, D2 knows sent(m). Because it might have taken e time units for m to be 
delivered, R2 cannot be sure that D2 knows sent(m) before t$ + e. Thus, K R KDsent{m) 
holds at time ts + e and no earlier. D2 knows that R2 will not know that D2 knows 
sent(m) before ts + e. Because for all D2 knows m may have been delivered immediately 
(in which case ts = to), D2 does not know that R2 knows that D2 knows sent(m) before 
tjj + e. Since tjj might be equal to ts + e, R2 must wait until ts + 2e before he knows that 
t D + e has passed. Thus, K R K D K R K D sent{m) holds at time ts + 2e but no earlier. This 
line of reasoning can be continued indefinitely, and an easy proof by induction shows that 
before time ts + ke, the formula (K R K D ) k sent(m) does not hold, while at ts + ke it does 
hold. Thus, it "costs" e time units to acquire every level of "R2 knows that D2 knows". 
Recall that Csent(m) implies (K R Krj) k sent(m) for every k. It follows that Csent{m) will 
never be attained! 

We can capture this situation using our formal model as follows. Let MIN = |_W e J> 
and consider the system with a countable set of runs {r^, r • : % an integer with i > —MIN} 
If i > —MIN, then in run r,, R2 sends the message m at time ts + it and D2 receives 
it at the same time. In run r[, R2 again sends the message m at time ts + but D2 
receives it at time ts + [i + l)e. (Note our choice of MIN guarantees that all messages 
are sent at time greater than or equal to 0.) If we assume that in fact the message in the 
example took e time to arrive, then the run r' describes the true situation. However, it is 
easy to see that at all times t, R2 cannot distinguish runs and r • (in that its local state 
is the same at the corresponding points in the two runs, assuming that only message m 
is sent), while D2 cannot distinguish and r' i _ l (provided % — 1 > —MIN). 

Our discussion of knowledge in a distributed system is motivated by the fact that 
we can view processors' actions as being based on their knowledge. Consider an eager 
epistemic interpretation X under which R2 believes Csent(m) as soon as it sends the 
message m, while D2 believes Csent(m) as soon as it receives m. Clearly, X is not a 
knowledge interpretation, because it is not knowledge consistent (R2 might believe that 
D2 knows sent(m), when in fact D2 does not). However, once D2 receives m, which 
happens at most e time units after R2 starts believing Csent(m), it is easy to see that 
Csent(m) does indeed hold! In a sense, Lemma ^| says that attaining common knowledge 
requires a certain kind of "natural birth"; it is not possible to attain it consistently 
unless simultaneity is attainable. But if one is willing to give up knowledge consistency 
(i.e., abandon the D if axiom) for short intervals of time, something very similar to 
common knowledge can be attained. 

The period of up to e time units during which R2 and D2's "knowledge" might be in- 
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consistent might have many negative consequences. If the processors need to act based on 
whether Csent(m) holds during that interval, they might not act in an appropriately co- 
ordinated way. This is a familiar problem in the context of distributed database systems. 
There, committing a transaction roughly corresponds to entering into an agreement that 
the transaction has taken place in the database. However, in general, different sites of 
the database commit transactions at different times (although usually all within a small 
time interval). When a new transaction is being committed there is a "window of vulner- 
ability" during which different sites might reflect inconsistent histories of the database. 
However, once all sites commit the transaction, the history of the database that the sites 
reflect becomes consistent (at least as far as the particular transaction is concerned). In 
Section 13 we return to the question of when an "almost knowledge consistent" version 
of common knowledge can be safely used "as if it were" common knowledge. 

Returning to the R2-D2 example, note that it is the uncertainty in relative message 
delivery time that makes it impossible to attain common knowledge, and not the fact 
that communication is not instantaneous. If it were common knowledge that messages 
took exactly e time units to arrive, then sent(m) would be common knowledge at time 
ts + e (and the system would consist only of run ri). 

Another way of removing the uncertainty is by having a common (global) clock in the 
system. Suppose that there is such a clock. Consider what would happen if R2 sends D2 
the following message m'\ 



"This message is being sent at time tg] m." 



Since there is a global clock and it is guaranteed that every message sent by R2 is 
delivered within e time units, the fact that R2 sent w! to D2 would again become common 
knowledge at time ts + e! In this case, the system would consist of two runs, r and r 1 . 
At time ts + e, D2 would know which of the two was actually the case, although R2 
would not (although D2 could tell him by sending a message). 

It seems that common knowledge is attainable in the latter two cases due to the 
possibility of simultaneously making the transition from not having common knowledge 
to having common knowledge (at time ts + e). The impossibility of doing so in the 
first case was the driving force behind the extra cost in time incurred in attaining each 
additional level of knowledge. 

Lemma [| already implies that when dp first holds all processors must come to believe 
dp simultaneously. In particular, this means that all of the processors' histories must 
change simultaneously. However, strictly speaking, practical systems cannot guarantee 
absolute simultaneity. In particular, we claim that essentially all practical distributed 
systems have some inherent temporal uncertainty. There is always some uncertainty 
about the precise instant at which each processor starts functioning, and about exactly 
how much time each message takes to be delivered. In Appendix B we give a precise 
formulation of the notion of temporal imprecision, which captures these properties, and 
use methods derived from | DHS86|| and | HMM85 ] to prove the following result: 
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Theorem 8: Let R be a system with temporal imprecision, let I be a knowledge 
interpretation for R, and let \G\ > 2. Then for all runs r 6 R, times t, and formulas ip 
it is the case that (X, r, t) \= C G <p iff (X, r, 0) |= C G <^. 

Since practical systems turn out to have temporal imprecision, Theorem |8| implies 
that, strictly speaking, common knowledge cannot be attained in practical distributed 
systems! In such systems, we have the following situation: a fact (p can be known to a 
processor without being common knowledge, or it can be common knowledge (in which 
case that processor also knows <p), but due to (possibly negligible) imperfections in the 
system's state of synchronization and its communication medium, there is no way of 
getting from the first situation to the second! Note that if there is a global clock, then 
there cannot be any temporal imprecision. Thus, it is consistent with Theorem |8] that 
common knowledge is attainable in a system with a global clock. 

Observe that we can now show that, formally speaking, even people cannot attain 
common knowledge of any new fact! Consider the father publicly announcing m to the 
children in the muddy children puzzle. Even if we assume that it is common knowledge 
that the children all hear whatever the father says and understand it, there remains some 
uncertainty as to exactly when each child comes to know (or comprehend) the father's 
statement. Thus, it is easy to see that the children do not immediately have common 
knowledge of the father's announcement. Furthermore, for similar reasons the father's 
statement can never become common knowledge. 

9 A paradox? 

There is a close correspondence between agreements, coordinated actions, and common 
knowledge. We have argued that in a precise sense, reaching agreements and coordinating 
actions in a distributed system requires attaining common knowledge of certain facts. 
However, in the previous section we showed that common knowledge cannot be attained in 
practical distributed systems! We are faced with a seemingly paradoxical situation on two 
accounts. First of all, these results are in contradiction with practical experience, in which 
operations such as reaching agreement and coordinating actions are routinely performed 
in many actual distributed systems. It certainly seems as if these actions are performed 
in such systems without the designers having to worry about common knowledge (and 
despite the fact that we have proved that common knowledge is unattainable!). Secondly, 
these results seem to contradict our intuitive feeling that common knowledge is attained 
in many actual situations; for example, by the children in the muddy children puzzle. 

Where is the catch? How can we explain this apparent discrepancy between our formal 
treatment and practical experience? What is the right way to interpret our negative 
results from the previous section? Is there indeed a paradox here? Or perhaps we are 
using a wrong or useless definition of common knowledge? 

We believe that we do have a useful and meaningful definition of common knowledge. 
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However, a closer inspection of the situation is needed in order to understand the subtle 
issues involved. First of all, we shall see that only rather strong notions of coordination 
in a distributed system require common knowledge. Common knowledge corresponds to 
absolutely simultaneous coordination, which is more than is necessary in many particular 
applications. For many other types of coordination, weaker states of knowledge suffice. 
In the coming sections we investigate a variety of weaker states of knowledge that are 
appropriate for many applications. Furthermore, in many cases practical situations (and 
practical distributed systems) can be faithfully modeled by a simplified abstract model, 
in which common knowledge is attainable. In such a case, when facts become common 
knowledge in the abstract model it may be perfectly safe and reasonable to consider them 
to be common knowledge when deciding on actions to be performed in the actual system. 
We discuss this in greater detail in Section [13 . 



10 Common knowledge revisited 

In Section 8 we showed that common knowledge is not attainable in practical distributed 
systems under any reasonable interpretation of knowledge (i.e., in any epistemic inter- 
pretation). Our purpose in the coming sections is to investigate what states of knowledge 
are attainable in such systems. For that purpose, we restrict our attention to view-based 
interpretations of knowledge, since they seem to be the most appropriate for many ap- 
plications in distributed systems. Under view-based interpretations, it seems useful to 
consider an alternative view of common knowledge. 

Recall the children's state of knowledge of the fact m in the muddy children puzzle. If 
we assume that it is common knowledge that all children comprehend m simultaneously, 
then after the father announces m, the children attain Cm. However, when they attain 
Cm it is not the case that the children learn the infinitely many facts of the form E k m 
separately. Rather, after the father speaks, the children are in a state of knowledge 5* 
characterized by the fact that every child knows both that m holds and that S holds. 
Thus, S satisfies the equation 

S = E(mAS). 

The fixed point axiom of Section 6 says that under a view-based interpretation, C G <f 
is a solution for X in an analogous fixed point equation, namely 

X = E G (<p A X). 

Now this equation has many solutions, including, for example, both false and C G (ip A 
ip), for any formula ip. C G ip can be characterized as being the greatest fixed point of 
the equation; i.e., a fixed point that is implied by all other solutions. (The least fixed 
point of this equation is false, since it implies all other solutions.) As our discussion of 
common knowledge in the case of the muddy children puzzle suggests, expressing common 
knowledge as a greatest fixed point of such an equation seems to correspond more closely 
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to the way it actually arises. We sketch a semantics for a propositional view-based logic 
of knowledge with fixed points in Appendix A. This alternative point of view, considering 
common knowledge as the greatest fixed point of such an equation, will turn out to be 
very useful when we attempt to define related variants of common knowledge. 

11 e-common knowledge and ^-common knowledge 

Since, strictly speaking, common knowledge cannot be attained in practical distributed 
systems, it is natural to ask what states of knowledge can be obtained by the commu- 
nication process. In this section we consider what states of knowledge are attained in 
systems in which communication delivery is guaranteed but message delivery times are 
uncertain. For ease of exposition, we restrict our attention to view-based interpretations 
of knowledge here and in the next section. 

We begin by considering synchronous broadcast channels of communication; i.e., ones 
where every message sent is received by all processors, and there are constants L and e 
such that all processors receive the message between L and L + e time units from the time 
it is sent. We call e the broadcast spread of such a channel. Recall that the properties 
of the system hold throughout all of its runs and hence are common knowledge. In 
particular, the properties of the broadcast channel are common knowledge under any 
view-based interpretation. 

Let us now consider the state of knowledge of the system when a processor pi receives 
a broadcast message m. Clearly pi knows that within an interval of e time units around 
the current time everyone (receives m and) knows sent(m). But Pi also knows that any 
other processor that receives m will know that all processors will receive m within such 
an e interval. Let us define within an e interval, everyone knows if, denoted E € ip, to hold 
if there is an interval of e time units containing the current time such that each processor 
comes to know ip at some point in this interval. Formally, we have: (X, r, t) |= E G ip if 
there exists an interval / = \t' , t' + e] such that t £ I and for all Pi G G there exists ij G / 
for which (X, r, ti) \= K^ip. Let ip be "some processor has received to" . In a synchronous 
broadcast system as described above, we clearly have that ip D E e ip is valid. 

We are thus in a state of knowledge that is analogous to common knowledge; here, 
however, rather than everyone knowing ip at the same instant, they all come to know ip 
within an interval of e time units. We call this the state of group knowledge e-common 
knowledge, denoted C e . The formal definition of C G ip is as the greatest fixed point of the 
equation: 

X = E G (ip A X). 

We refer the reader to Appendix A for a rigorous definition. The fact ip above, stating that 
some processor received the message to, has the property that ip D C e ip. In addition, as 
ip D sent{m) is valid, it is also the case that ip D C e sent{m) . Thus, when some processor 
receives m it becomes e-common knowledge that m has been sent. 
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As a straightforward consequence of its definition, C e satisfies the appropriate ana- 
logues of the fixed point axiom CI and the induction rule C2 of Section 6 (replacing E 
by E e and C by C e ). Note that we did not define C G ip as an infinite conjunction of 
(E G ) k ip, k > 1. While it is not hard to show that C G p> implies this infinite conjunction, 
it is not equivalent to it; however, giving a detailed counterexample is beyond the scope 
of this paper. (We give an example of a similar phenomenon below.) The fixed point 
definition is the one that is appropriate for our applications. Just as common knowledge 
corresponds to simultaneous actions in a distributed system, e-common knowledge corre- 
sponds to actions that are guaranteed to be performed within e time units of one another. 
This is what we get from the fixed point axiom CI, which does not hold in general for 
the infinite conjunction. We are often interested in actions that are guaranteed to be 
performed within a small time window. For example, in an "early stopping" protocol for 
Byzantine agreement (cf. ||DRS90|| ), all correct processors are guaranteed to decide on a 
common value within e time units of each other. It follows that once the first processor 
decides, the decision value is e-common knowledge.^ 

There is one important special case where it can be shown that the fixed point def- 
inition of C G p is equivalent to the infinite conjunction. This arises when we restrict 
attention to complete-history interpretations and stable facts, facts that once true, re- 
main true. Many facts of interest in distributed systems applications, such as u ip held at 
some point in the past", "the initial value of x is 1", or "</? holds at time t on p^s clock", 
are stable. If ip is stable, then it is not hard to check that in complete-history interpre- 
tations, we have that E G <p holds iff E G p will hold in e time units. As a straightforward 
consequence of this observation, we can show that in complete-history interpretations, 
for a stable fact tp we do have that C G (p holds iff (E G ) k (p holds for all k > l.f] 

It is not hard to verify that of the properties of S5, C e satisfies only A3 (positive in- 
trospection) and Rl (the rule of necessitation). The failure of C e to satisfy the knowledge 
axiom and the consequence closure axiom can be traced to the failure of E e to satisfy 
these axioms. The problem is that E e p> only requires that <p hold and be known at some 
(not all!) of the points in the e interval /. Indeed, it is not hard to construct an example 
in which E € p A E e -np holds. We remark that if we restrict attention to stable facts and 
complete-history interpretations, then consequence closure does hold for both E e and C e 

It is interesting to compare e-common knowledge with common knowledge. Clearly, 
dp D C e p is valid. However, since synchronous broadcast channels are implementable in 
systems where common knowledge is not attainable, the converse does not hold. Thus, 
e-common knowledge is strictly weaker than common knowledge. Moreover, note that 
while Cp is a static state of knowledge, which can be true of a point in time irrespective 

5 The situation there is in fact slightly more complicated since only the correct processors are required 
to decide; see [MT88| for definitions of knowledge appropriate for such situations. 

6 We remark that in earlier versions of this paper, we restricted attention to complete-history inter- 
pretations and stable facts, and defined E^,tp as Q) t E G (p 1 where Q t ip is true at a point (r, t) iff tp is true 
e time units later, at (r, t + e). By the comments above, our current definition is a generalization of our 
former definition. 
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of its past or future, C e p is a notion that is essentially temporal. Whether or not it holds 
depends on what processors will know in an e interval around the current time. 

For any message m broadcast on a channel with broadcast spread e, the fact sent(m) 
becomes e-common knowledge L time units after m is broadcast (in particular, as soon 
as it is sent if L — 0). Upon receiving m, a processor p^ knows that C € sent(m) holds, 
i.e. KiC e sent(m) holds. Returning to R2 and D2's communication problem, we can view 
them as a synchronous broadcast system, and indeed they attain C e sent(m) immediately 
when R2 sends the message m. (Note that L = in this particular example; the interested 
reader is invited to check that R2 and D2 in fact achieve e/2-common knowledge of 
sent{m) at time t$ + e/2.) 

Having discussed states of knowledge in synchronous broadcast channels, we now 
turn our attention to systems in which communication is asynchronous: no bound on 
the delivery times of messages in the system exists. Consider the state of knowledge of 
sent(m) in a system in which m is broadcast over an asynchronous channel: a channel 
that guarantees that every message broadcast will eventually reach every processor. Upon 
receiving m, a processor knows sent(m), and knows that every other processor either has 
already received m or will eventually receive m. This situation, where it is common 
knowledge that if m is sent then everyone will eventually know that m has been sent, 
gives rise to a weak state of group knowledge which we call eventual common knowledge. 

We define everyone in G will eventually have known (p, denoted E°{p, to hold if for 
every processor in G there is some time during the run at which it knows (p. Formally, 
(X, r, t) \= E^ip if for all Pi G G there exists ti > such that (X, r, £ 4 ) |= Ki<p. We remark 
that if we restrict attention to stable facts p> and complete-history interpretations, then 
E^ip is equivalent to ()E G ip, that is, eventually everyone in G knows <ptf\ We define the 
state of (^-common knowledge (read eventual common knowledge), denoted by (7°, by 
taking C G p> to be the greatest fixed point of the equation: 

X= %M). 

Notice that we again used the fixed point definition rather than one in terms of infinite 
conjunction of (E^) k ip, k > 1. Our definition implies the infinite conjunction but, as we 
show by example below, it is not equivalent to the infinite conjunction, even if we restrict 
to stable facts and complete-history interpretations. 

Our motivation for considering the fixed point definition is the same as it was in the 
case of e-common knowledge. The fixed point definition gives us analogues to CI and 
C2; as a consequence, <)-common knowledge corresponds to events that are guaranteed 
to take place at all sites eventually. For example, in some of the work on variants of 
the Byzantine Agreement problem discussed in the literature (cf. ||DRS90|| ), the kind of 



agreement sought is one in which whenever a correct processor decides on a given value, 



7 Formally, we take Qip to be true at a point (r, t) if ip is true at some point (r, t') with t' > t. In 
an earlier version of this paper, we defined E°tp as (}E G ip. Again, by the comments above, our current 
definition is a generalization of our former one. 
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each other correct processor is guaranteed to eventually decide on the same value. The 
state of knowledge of the decision value that the processors attain in such circumstances is 
O-common knowledge. Also, in asynchronous error-free broadcast channels, a processor 
knows that sent(m) is O-common knowledge when it receives the message m. 

C G is the weakest temporal notion of common knowledge that we have introduced. 
In fact, we now have a hierarchy of the temporal notions of common knowledge. For any 
fact if and e\ < ■ ■ ■ < < e^+i < • • • , we have: 

C G p D C%<p D ■ ■ • D C%<p D Cc fc+ V D ■ ■ ■ D Cl<p. 

We next consider how C e and C° are affected by communication not being guaranteed. 
Recall that Theorem [| implies that if communication is not guaranteed, then common 
knowledge is independent of the communication process. A fact only becomes common 
knowledge if it becomes common knowledge in the absence of messages. Interestingly, 
the obvious analogue of Theorem || does not hold for C e and C° . Indeed, it is possible to 
construct a situation in which C e ip is attained only if communication is not sufficiently 
successful. For example, consider a system consisting of R2 and D2 connected by a 
two-way link. Communication along the link is not guaranteed, R2 and D2's clocks are 
perfectly synchronized, and both of them run the following protocol: At time 0, send the 
message "OK". For all natural numbers k > 0, if you have received k "OK" messages 
by time k on your clock, send an "OK" message at time k; otherwise, send nothing. 
Let ip ="it is time k where k > 1 and some message sent at or before time k — 1 was 
not delivered within one time unit." Assume a complete-history interpretation for this 
system and fix e = 1. It is easy to see that ip D E e ip is valid in this system. For suppose 
that at time k the fact ip holds because one of R2's messages was not delivered to D2. D2 
knows ip at time k and, according to the protocol, will not send a message to R2 at time 
k. Thus, by time k + 1, R2 will also know ip (if it didn't know it earlier). The induction 
rule implies that ip D C e ip is also valid in the system. If r is a run of the system where 
no messages are received, then it is easy to see that ip holds at (r, 1), and hence so does 
C e ip. However, C £ i/j does not hold at (r', 1) if r' is a run where all messages are delivered 
within one time unit. (The same example works for C^ip.) 

In the example above, successful communication in a system where communication 
is not guaranteed can prevent C^ip (resp. C%ip) from holding. However, the following 
theorem shows that we can get a partial analogue to Theorem [5] for C e and C°. Intuitively, 
it states that if C^ip (resp. C^i/j) does not hold in the absence of successful communication, 
then C e G ip (resp. C^ip) does not hold regardless of how successful communication may turn 
out to be. More formally, 

Theorem 9: Let R and G be as in Theorem |j, and let X be a view-based interpretation. 
Let r~ be a run of R where no messages are received. If (X, r~,t) ty= C e G tp (resp. (X, r~,t) \£ 
C%ip) for all times t, then (X, r, t) \t= C G (p (resp. (X, r, t) ^ C G ip) for all runs r with the 
same initial configuration and the same clock readings as r~ and all times t. 
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Proof: We sketch the proof for C G ip; the proof for C G ip is analogous. We assume that 
all runs mentioned in this proof have the same initial configuration and the same clock 
readings as r~ . If r is a run such that C G ip holds at some point in r, let tj(r) be the first 
time in r that processor pj G G knows C G (p. Let t(r) = max{t,(r) : pj G G}, and let 
d(r) be the number of messages that are received in r up to (but not including) t(r). We 
show by induction on k that if r is a run such that C G ip holds at some point in r, then 
d(r) 7^ k. This will show that in fact C G ip can never hold. 

If d(r) = and C G <p holds at some point in r, choose some Pi G G and let ti = ti(r). 
Then we have that (X, r, ti) \= KiC G (p. Clearly h(pi,r,ti) = h(pi,r~,ti), so (l,r~,U) \= 
KiC G ip. By the knowledge axiom, we have that (l,r~,U) \= C G ip, contradicting the 
hypothesis of the theorem. 

For the inductive step, assume that d(r) = k + 1 and let t = t(r). We now proceed 
as in the proof of Theorem |5]. Let pj be a processor receiving the last message received 
in r before time t. Let t! be the time at which pj receives this message. Let pi be 
a processor in G such that pi ^ pj and let £, = U(r). Since communication is not 
guaranteed, there exists a run r' extending (r, t') such that (1) no messages are received 
in r' at or after time t, (2) h(pi,r,t") = h(pi,r',t") for all t" < t, and (3) all processors 
Pk 7^ Pi receive no messages in the interval [t',t). By construction, at most k messages 
are received altogether in r', so d(r') < k. By the induction hypothesis we have that 
(X, r',t") |= ~^C G f for all t". It follows that (X, r\U) |= ->KiC Q (p. But since we assumed 
(X, r, ti) |= KiC G ip and h(pi,r,ti) = h(pi,r' ,ti), this gives us a contradiction. | 

We can now use Theorem |9] to prove an analogue to Corollary ^, which shows that 
if communication is not guaranteed, then there is no protocol for eventually coordinated 
attack. 

Proposition 10: In the coordinated attack problem, any protocol that guarantees that 
whenever either party attacks the other party will eventually attack, is a protocol in 
which necessarily neither party attacks. 

Proof: The proof is analogous to that of Corollary || Assume that (Pa, Pb) is a 
joint protocol that guarantees that if either party attacks then they both eventually 
attack, and let R be the corresponding system. Let if) ="At least one of the generals 
has started attacking". We first show that when either general attacks, then eventual 
common knowledge of if) must hold. Since the protocol guarantees that whenever one 
general attacks the other one eventually attacks, it is easy to see that a general that has 
decided to attack knows if) and knows that eventually both generals will know if). Thus, 
by the induction rule for (7°, when a general attacks C^ip holds. Since in every run of 
the protocol in which no messages are received no party attacks (and hence neither if) 
nor C°ip hold in such runs), by Theorem ^, the protocol (Pa, Pb) guarantees that neither 
general will ever attack. | 

Theorem |] allows us to construct an example in which the infinite conjunction 
of (E°) k ip holds, but C^ip does not. In the setting of the coordinated attack problem, 
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Let ip be " General A is in favor of attacking 1 . Consider a run in which all messengers 
arrive safely, and messages are acknowledged ad infinitum. Clearly, assuming a complete- 
history interpretation, for all k it is the case that E k <p> holds after the fcth message is 
delivered. It follows that (E^ip holds at time 0. However, Theorem |9] implies that C°p 
never holds in this run. It follows that C°p is not equivalent to the infinite conjunction 
of {E ) k ip even in the case of stable facts ip and complete-history interpretations. 

Recall that the proof that unreliable communication cannot affect what facts are 
common knowledge carried over to (reliable) asynchronous communication. Our proof in 
Theorem |9] clearly does not carry over. In fact, a message broadcast over a reliable asyn- 
chronous channel does become eventual common knowledge. However, it is possible to 
show that asynchronous channels cannot be used in order to attain e-common knowledge: 

Theorem 11: Let R be a system with unbounded delivery times and let \G\ > 2. 
Suppose there is some run r~ in R in which no message are delivered in the interval 
[0, t + e) such that (I,r~,t) \/= C e G ip. Then for all runs r in R with the same initial 
configuration and the same clock readings as r~, we have (T, r, t) \t= C G ip. 

Sketch of Proof: The proof essentially follows the proof of Theorems § and || We 
proceed by induction on d(r), the number of messages received in r up to time t. Details 
are left to the reader. | 

Thus, asynchronous communication channels are of no use for coordinating actions 
that are guaranteed to be performed at all sites within a predetermined fixed time bound. 

12 Time stamping: using relativistic time 

Real time is not always the appropriate notion of time to consider in a distributed system. 
Processors in a distributed system often do not have access to a common source of real 
time, and their clocks do not show identical readings at any given real time. Furthermore, 
the actions taken by the processors rarely actually depend on real time. Rather, time is 
often used mainly for correctly sequencing events at the different sites and for maintaining 
"consistent" views of the state of the system. In this section we consider states of 
knowledge relative to relativistic notions of time. 

Consider the following scenario: R2 knows that R2 and D2's clock differ by at most S, 
and that any message R2 sends D2 will arrive within e time units. R2 sends D2 the 
following message m'\ 

"This message is being sent at ts on R2's clock, and will reach D2 by ts + e + 5 
on both clocks; m." 

Let us denote ts + e + 5 by T . Now, at time T on his clock, R2 would like to claim 
that sent{m') is common knowledge. Is it? Well, we know by now that it is not, but it 
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is interesting to analyze this situation. Before we do so, let us introduce a relativistic 
formalism for knowledge, which we call timestamped knowledge: We denote "at time T 
on his clock, pi knows tp" by K[tp. T is said to be the timestamp associated with this 
knowledge. We then define 

E T G p ee A KiV- 
Pi eG 

E T ip corresponds to everyone knowing ip individually at time T on their own clocks. 
Notice that for T as above, sent(m') D E TO sent(m'). It is natural to define the cor- 
responding relativistic variant of common knowledge, C T , which we call timestamped 
common knowledge, so that C G p is the greatest fixed point of the equation 

X = El(ipAX). 

So, in any run where the message m' is sent, R2 and D2 have timestamped common 
knowledge of sent(m') with timestamp T . It is easy to check that C T satisfies the fixed 
point axiom and the induction rule, as well as all of the axioms of S5 except for the 
knowledge axiom. In this respect, C T resembles C more closely than C e and C° do. 

It is interesting to investigate how the relativistic notion of timestamped common 
knowledge relates to the notions of common knowledge, e-common knowledge, and (}- 
common knowledge. Not surprisingly, the relative behavior of the clocks in the system 
plays a crucial role in determining the meaning of C T . 



Theorem 12: For any fact ip and view-based interpretation, 

(a) if it is guaranteed that all clocks show identical times, then at time T on any 
processor's clock, C G p = C G p. 

(b) if it is guaranteed that all clocks are within e time units of each other, then at 
time T on any processor's clock, C G p D C G p. 

(c) if it is guaranteed that each local clock reads T at some time, then C G p D C G p. I 



Theorem [T2| gives conditions under which C T can be replaced by C, C € , and C°. A 
weak converse of Theorem [12| holds as well. Suppose the processors are able to set their 
clocks to a commonly agreed upon time T when they come to know C G p (resp. come 
to know C G p, C G p>). Then it is easy to see that whenever C G p (resp. C G p, C G p>) is 
attainable, so is C G p. 

In many distributed systems timestamped common knowledge seems to be a more 
appropriate notion to reason about than "true" common knowledge. Although common 
knowledge cannot be attained in practical systems, timestamped common knowledge 
is attainable in many cases of interest and seems to correspond closely to the relevant 
phenomena with which protocol designers are confronted. For example, in distributed 
protocols that work in phases, we speak of the state of the system at the beginning of 
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phase 2, at the end of phase k, and so on. It is natural to think of the phase number 
as a "clock" reading, and consider knowledge about what holds at the different phases 
as "timestamped" knowledge, with the phase number being the timestamp. In certain 
protocols for Byzantine agreement, for example, the nonfaulty processors attain common 
knowledge of the decision value at the end of phase k (cf. |pM90| , |MT88|| ). In practical 
systems in which the phases do not end simultaneously at the different sites of the 
system, the processors can be thought of as actually attaining timestamped common 
knowledge of the decision value, with the timestamp being "the end of phase k" . Indeed, 
protocols like the atomic broadcast protocol of | |CASD85[ are designed exactly for the 
purpose of attaining timestamped common knowledge. (See ||NT93|| for more discussion 
of timestamped common knowledge.) 



13 Internal knowledge consistency 

We have seen that common knowledge closely corresponds to the ability to perform 
simultaneous actions. In the last few sections we introduced a number of related states 
of knowledge corresponding to weaker forms of coordinated actions. Such weaker forms 
of coordination are often sufficient for many practical applications. This helps explain 
the paradox of the happy existence of practical distributed systems despite the apparent 
need for common knowledge and the negative results of Theorem ^| 

However, there are situations where we act as if - or we would like to carry out our 
analysis as if - we had true common knowledge, not a weaker variant. For example, in 
the muddy children puzzle, even though simultaneity may not be attainable, we want to 
assume that the children do indeed have common knowledge of the father's statement. As 
another example, consider a protocol that proceeds in phases, in which it is guaranteed 
that no processor will ever receive a message out of phase. In many cases, all the aspects 
of this protocol that we may be interested in are faithfully represented if we model the 
system as if it were truly synchronous: all processors switch from one phase to the next 
simultaneously. 

Intuitively, in both cases, the assumption of common knowledge seems to be a safe 
one, even if it is not quite true. We would like to make this intuition precise. Recall that 
an epistemic interpretation is one that specifies what a processor believes at any given 
point as a function of the processor's history at that point. An epistemic interpretation X 
is a knowledge interpretation if it is knowledge consistent, i.e., if it has the property 
that whenever (X, r, t) \= Kiip then also (X, r, t) \= if. Now an epistemic interpretation 
that is not knowledge consistent may nevertheless be internally knowledge consistent, 
which intuitively means that the processors never obtain information from within the 
system that would contradict the assumption that the epistemic interpretation is in fact 
a knowledge interpretation. In other words, no processor ever has information that 
implies that the knowledge axiom K^ip D ip is violated. More formally, an epistemic 
interpretation X for a system R is said to be internally knowledge consistent if there is 
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a subsystem R' C R such that X is a knowledge interpretation when restricted to R', 
and for all processors Pi and points (r, t) of R, there is a point (r',t') in i?' such that 
h(p h r,t) = h(pi,r',t'). 

Given that epistemic interpretations ascribe knowledge (or, perhaps more appropri- 
ately in this case, beliefs) to processors as a function of the processors' histories, the 
above definition implies that whenever a processor is ascribed knowledge of a certain fact 
at a point of R, then as far as any events involving this processor at the current and at 
any future time are concerned, it is consistent to assume that the fact does indeed hold. 

Using the notion of internal knowledge consistency, we can make our previous intu- 
itions precise. When analyzing the muddy children puzzle, we assume that the children 
will never discover that they did not hear and comprehend the father's statement simul- 
taneously. We take the set R' from the definition of internal knowledge consistency here 
to be precisely the set of runs where they did hear and comprehend the father's state- 
ment simultaneously. Similarly, in the case of the protocol discussed above, the set R' is 
the set where all processors advance from one phase to the next truly simultaneously. It 
now also makes sense to say that under reasonable conditions processors can safely use 
an "eager" protocol corresponding to the eager epistemic interpretation of Section 8, in 
which processors act as if they had common knowledge, even though common knowledge 
does not hold. It is possible to give a number of conditions on the ordering of events 
in the system that will ensure that it will be internally knowledge consistent for the 
processors to act as if they have common knowledge. 

For further discussion on internal knowledge consistency, see the recent paper by 
Neiger |[INei88 . 



14 Conclusions 

In this paper, we have tried to bring out the important role of reasoning about knowledge 
in distributed systems. We have shown that reasoning about the knowledge of a group 
and its evolution can reveal subtleties that may not otherwise be apparent, can sharpen 
our understanding of basic issues, and can improve the high-level reasoning required in 
the design and analysis of distributed protocols and plans. 

We introduced a number of states of group knowledge, but focused much of our at- 
tention on one particular state, that of common knowledge. We showed that, in a precise 
sense, common knowledge is a prerequisite for agreement. However, we also showed that 
in many practical systems common knowledge is not attainable. This led us to consider 
three variants of common knowledge — e-common knowledge, eventual common knowl- 
edge, and timestamped common knowledge — that are attainable in practice, and may 
suffice for carrying out a number of actions. The methodology we introduce for con- 
structing these variants of common knowledge, involving the fixed-point operator, can 
be used to construct other useful variants of common knowledge. Indeed, recent papers 
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have introduced concurrent common knowledge |PT92 , probabilistic common knowledge 
? H94j| , and polynomial time common knowledge ||Mos88| 



using this methodology. 



There is clearly much more work to be done in terms of gaining a better under- 
standing of knowledge in distributed systems. This paper considers a general model 
of a distributed system. It would also be useful to consider knowledge in distributed 
systems with particular properties. The work of Chandy and Misra ||CM86|| is an in- 
teresting study of this kind (see [ UM9C , KHV92 , Had87| for other examples). We car- 
ried out a knowledge-based analysis of the coordinated attack problem here. Since this 
paper first appeared, a number of other problems, including Byzantine agreement, dis- 
tributed commitment, and mutual exclusion, have been analyzed in terms of knowledge 
(see [|CM86| , |DM90| , |Had87| , |HZ92| , |ML90| , |MT88| , |NT93| ). Such knowledge-based analy- 



ses both shed light on the problem being studied and improve our understanding of the 
methodology. More studies of this kind would further deepen our understanding of the 
issues involved. 

Another general direction of research is that of using knowledge for the specification 
and verification of distributed systems. (See ||KT86|| for an initial step in this direction.) 
Formalisms based on knowledge may prove to be a powerful tool for specifying and 
verifying protocols, and may also be readily applicable to the synthesis of protocols and 
plans. Temporal logic has already proved somewhat successful in this regard [ [FC82 , 
MW84I1 . 

Our analysis of the muddy children puzzle and the coordinated attack problem, as 
well as the work in ||MDH86| , |HF85| , PM9Q , |MT88|| illustrate how subtle the relationship 
between knowledge, action, and communication in a distributed system can be. In this 
context, Halpern and Fagin (cf. |[HF85|| ) look at knowledge-based protocols, which are 
protocols in which a processor's actions are explicitly based on the processor's knowledge. 
This provides an interesting generalization of the more standard notions of protocols. 

In the long run, we hope that a theory of knowledge, communication, and action will 
prove rich enough to provide general foundations for a unified theoretical treatment of 
distributed systems. Such a theory also promises to shed light on aspects of knowledge 
that are relevant to related fields. 
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Appendix A 



In this appendix we present a logic with a greatest fixed point operator and illustrate 
how common knowledge and variants of common knowledge can be formally defined as 
greatest fixed points. Our presentation follows that of Kozen |[Koz83|| . 

Intuitively, given a system R, a formula ip partitions the points of R into two sets: 
those that satisfy r/>, and those that do not. We can identify a formula with the set of 
points that satisfy it. In order to be able to define fixed points of certain formulas, which 
is our objective in this appendix, we consider formulas that may contain a free variable 
whose values range over subsets of the points of R. Once we assign a set of points to the 
free variable, the formula can be associated with a set of points in a straightforward way 
(as will be shown below). Thus, such a formula can be viewed as a function from subsets 
of R to subsets of R. (A formula with no free variable is then considered a constant 
function, yielding the same subset regardless of the assignment.) 

Before we define the logic more formally, we need to review a number of relevant facts 
about fixed points. Suppose S is a set and / is a function mapping subsets of S to subsets 
of S. A subset A of S is said to be a fixed point of / if f(A) = A. A greatest (respectively, 
least) fixed point of / is a set B such that f(B) = B, and if f(A) = A, then A C B (resp. 
B C A). It follows that if / has a greatest fixed point B, then B = \J{A : f(A) = A}. 
The function / is said to be monotone increasing if f(A) C f(B) whenever A C B 
and monotone decreasing if f(A) D f(B) whenever A C B. The Knaster-Tarski theorem 
(cf. |[Tar55|| ) implies that a monotone increasing function has a greatest (and a least) fixed 
point. Given a function / and a subset A, define f°(A) = A and f l+1 (A) = f(f l (A)). 
f is said to be downward continuous if f(f] i AA = Hi /(A) for all sequences Ai,A 2 ,... 
with A\ D A2 3 . . .. Given a monotone increasing and downward continuous function / 
it is not hard to show that the greatest fixed point of / is the set f)k<w f k (S). We remark 
that if / is monotone increasing but not downward continuous, then we can still obtain 
the greatest fixed points of / in this fashion, but we have to extend the construction by 
defining f a for all ordinals a.Q 

We are now in a position to formally define our logic. We start with a set $ = 
{P, Q, P±, . . .} of primitive propositions and a single propositional variable A. We form 
more complicated formulas by allowing the special formula true and then closing off 
under conjunction, negation, the modal operators K iy E G , E G , and E G for every group G 
of processors, and the greatest fixed point operator vX. Thus, if (p and ip are formulas, 
then so are -up, <p A ip, Ki<p, E G {p, E G cp, E G ip, and uX.tp (read "the greatest fixed point 
of if with respect to A"). However, we place a syntactic restriction, described below, on 
formulas of the form vX.ip. 

Just as Vx in first-order logic binds occurrences of x, vX binds occurrences of X. 

8 We can similarly define a function / to be upward continuous if /QL A4) = [J i f(Ai) for all sequences 

A±,A2,... with Ax C A% C For monotone increasing upward continuous functions /, the least fixed 

point of / is Ufc< w f k {®)- Again, to get least fixed points in the general case, we have to extend this 
construction through the ordinals. 
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Thus, in a formula such as X A -*E € G (yX.[X A (K ± X A K 2 X)]), the first occurrence of X 
is free, while the rest are bound. We say that a free occurrence of X in a formula tp is 
positive if it is in the scope of an even number of negation signs, and negative if it is in 
the scope of an odd number of negation signs. Thus, in a formula such as X A^KiX, the 
first occurrence of X is positive while the second is negative. The restriction on formulas 
of the form vX.tp is that all free occurrences of X in tp must be positive; the point of this 
restriction will be explained below. 

The next step is to associate with each formula a function. Given a distributed system 
represented by its set of runs R, let S = R x [0, oo). A model M. is a triple (S,n,v), 
where S is as above, n associates a truth assignment to the primitive propositions with 
each point in S, and v : {1, . . . , m} x S — > £ is an assignment of views (from a set of 
states £) to the processors at the points of S. We now associate with each formula tp a 
function tp M from subsets of S to subsets of S. Intuitively, if no occurrences of X are free 
in tp, then tp M will be a constant function, and tp M (A) will be the set of points where tp 
is true (no matter how we choose A). If X is free in tp, then tp M (A) is the set of points 
where tp is true if A is the set of points where X is true. We define (p M (A) by induction 
on the structure of ip as follows: 

(a) X M (A) = A (so X M is the identity function). 

(b) P M (A) = {s G S : n(s)(P) = true} for a primitive proposition P 

(c) true M (A) = S. 

(d) (~-tp) M (A) = S-tp M (A). 

(e) (tp A tp) M (A) = (p M (A) n ip M (A). 

(f) (Kitp) M (A) = {(r,t) G S : for all (r',f) G 5, v( Pi ,r,t) = v( Pi ,r',t') implies (r',f) G y? 

(g) (E G p) M (A) = n zeG (K tV ) M (A). 

(h) (E^)- M (A) = {(r,i)e5: there exists an interval I = [t',t'+e] with t & I, such that 

\/ P , t eG3t t el ((r,t l )e(K, l tp) M (A))}. 

(i) (S^)^(^) = ES:V Pl eG 3U ((r,U) G (X^)^(A))}. 
(j) (vX.tp) M (A) = U{B : p M (B) = B}. 

Now by an easy induction on the structure of formulas we can prove the following 
facts: 

1. If tp is a formula in which all free occurrences of X are positive (resp. negative), 
then tp M is monotone increasing (resp. monotone decreasing). Note that our 
syntactic restriction on formulas of the form vX.tp guarantees that for a well- formed 
formula of this form, the function tp M is monotone increasing. As a consequence, 
(vX.tp) M (A) is the greatest fixed point of the function tp M . 
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2. If <p is a formula with no free variables, then ip M is a constant function. In partic- 
ular, observe that (vX.ip) M is necessarily a constant function (the definition shows 
that (uX.if) M (A) is independent of the choice of A). As well, it is easy to check 
that if if is a valid formula such as ->(P A -i-P), then = S. 

3. For formulas in which the variable X does not appear (so, in particular, for formulas 
not involving the greatest fixed point operator), ip M (A) = {(r,t) : (l v ,r,t) \= ip}, 
where X v is the view-based interpretation associated with the view function v. 
(Again, this is true for any choice of A, since by the previous observation, <p M 
is a constant function if there is no occurrence of X in ip.) Thus, if we define 
(A4, r, t) |= ip iff (r, t) G <p M ($), then this definition extends our previous definition 
(in that for formulas in which the variable X does not appear, we have (A4, r, t) \= if 
iff (l v ,r,t) |= ip). 

Given the machinery at our disposal, we can now formally define C G <p as vX.E G (ip A 
X), define C G <f as vX.E Q {ip A X), and define C G ip as vX.E%(ip AX). It follows from our 
characterization of greatest fixed points of downward continuous functions that if if is 
downward continuous, then vX.ip is equivalent to <fo A ip\ A . . ., where ip is true, ip i+ i is 
<f[<fi/X], and f[ip/X] denotes the result of substituting -0 for the free occurrences of X 
in if. It is easy to check that (E G (if A X)) is downward continuous if <p M is downward 
continuous. In particular, if if has no free occurrences of X (so that f M is constant), it 
follows that we have: 

C G f = E G f A E G (ip A E G f) A E G (tf A E G (f A E G ip)) A . . . .Q 

Since E G {jpi A ^2) = (E G i/)i A E G ip2) it follows that 

C G </? = A Eg^gV? A • • • . 

However, (^(^AX))^ and (E G (<f AX)) M are not necessarily downward continuous. 
The reason that (E G (ifAX)) M is not downwards continuous is that an infinite collection 
of facts can each eventually hold, without them necessarily all holding simultaneously at 
some point. We have already seen one example of this phenomenon in Section |Tl|. For 
another example, suppose we are working in a system with an unbounded global clock, 
and let A{ = {current dime > i) M . Since the clock is unbounded, it follows that Ai 7^ 
for all i, but r\Aj = 0. Taking ip to be the formula E°(f AX), it is easy to see that 
(r,0) G ip M {Ai) for all 1, and hence ^(^(A)) + ^(n^). 

We can construct a similar example in the case of E e , because we have taken time to 
range over the reals. For example, if we take Xi to be an infinite sequence of real numbers 
coverging from below to e, take A{ = (current dime G (xj, e)) , and now take ip to be 

9 Note that the formula on the right-hand side of the equivalence is not in our language, since we 
have not allowed infinite conjunctions. However, we can easily extend the language to allow infinite 
conjunctions in the obvious way so that the equivalence holds. 
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the formula E e (<p A X), then again we have RjAj = 0, and (r, 0) G n i (^- M (^4^)). This 
example does depend crucially on the fact that time ranges over the reals. If instead 
we had taken time to range over the natural numbers, then would in fact get downward 
continuity. 

We encourage the reader to check that C G ip, C G (p, and C G p> all satisfy the fixed point 
axiom and the induction rule. The fixed point axiom is a special case of the more general 
fixed point axiom vX.p = ip[vX.ip/X], while the induction rule is a special case of the 
more general induction rule for fixed points: from ip D <p[ip/X] infer ip D uX.(p. The 
reader might also now wish to check that C has the properties of S5, while C e and C° 
satisfy the positive introspection axiom and the necessitation rule. Furthermore, for sta- 
ble fact ip and complete-history interpretations, they also satisfy the consequence closure 
axiom. C e and C° satisfy neither the knowledge axiom nor the negative introspection 
axiom. We remark that both notions satisfy weaker variants of the knowledge axiom: 
C e <p implies that <p holds at some point at most e time units away from the current point, 
while C°p implies that <p holds (at least) at some point during the run. 

It is straightforward to extend the above framework to include explicit individual clock 
times in order to define C G <p (see [|NT93| for more details). Here, for example, it is the 
case that (E T (p A X)) M is downward continuous, and E T distributes over conjunction; 
hence C T will coincide with the appropriate infinite conjunction. Similar treatments can 



be applied to many related variants of common knowledge (see, for example, [FH94 , 
Mos88| , FT9H ). 



Appendix B 

In this appendix we fill in the details of the proof that common knowledge cannot be 
attained in practical systems (Theorem |8| in Section ^). 

Our first step is to establish a general condition — namely, that the initial point of a 
run is reachable from any later point — under which common knowledge can be neither 
gained nor lost. We remark that Chandy and Misra have shown that in the case of 
completely asynchronous, event-driven systems where communication is not guaranteed, 
common knowledge of any fact can be neither gained nor lost OvTSfl] . Since it is easy to 



see that, in such systems, the initial point of a run is reachable from all later points, our 
result provides a generalization of that of [ fJM8C| . 

Proposition 13: Let r G R be a run in which the point (r, 0) is G-reachable from (r, t) in 
the graph corresponding to the complete-history interpretation, and let X be a knowledge 
interpretation for R. Then for all formulas ip we have (X, r, t) \= C G ip iff (X, r, 0) |= C G <p. 

Proof: Fix a run r, time t, and formula ip. Since (r, 0) is G-reachable from (r, t) in 
the graph corresponding to the complete-history interpretation, there exist points (r , t ), 
(r^tj, (r k ,tk) such that (r,t) = (r ,t ), (r, 0) = (r k ,tk), and for every i < k there 
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is a processor j i G G that has the same history at (r^tj) and at (r i+1 ,t i+1 ). We can now 
prove by induction on i, using Lemma |], that (X, r, t) \= C G f iff (X, r^ti) |= C G v?. The 
result follows. | 

We next provide a formal definition of systems with temporal imprecision, and show 
that in such systems, the initial point of a run is always reachable from later points. A 
system R has temporal imprecision if 

Vr G RVt > OViVj ^i35 > 0V5' G [0,5)3r'W < t 
(h( Pi , r, If) = hfa, r', t' + 5') A h( Pl , r, t') = r', t'))). 

Intuitively, this means that processors cannot perfectly coordinate their notions of time 
in a system with temporal imprecision. One processor might always be a little behind 
the others. 

By reachable in the following lemma we mean reachable (in the sense of Section 6) 
with respect to the view function defined by the complete-history interpretation. 



Lemma 14: If R is a system with temporal imprecision, then for all runs r G R and 
times t, the point (r, 0) is reachable from (r,t). 



Proof: Let R be a system with temporal imprecision and (r, t) be a point of R. 
Suppose t / (otherwise clearly (r, 0) is reachable from (r,t)). Let t be the greatest 
lower bound of the set {f : (r, t") is reachable from (r, t) for all t" G [f, t}}. We will show 
that (r, t ) is reachable from (r, t) and that £ = 0. Since R is a system with temporal 
imprecision, there exists a 5 such that for all 5' with < 5' < 5, there exists a run r' 
such that for all t' < t, we have h{p 1) r ) t') = h(p 1 ,r',t' + 5') and h(pi,r,t') = h(pi,r',t') 
for i ^ 1. If 5' < t' < t, it follows that (r',t') is reachable from (r,t r ) and (r, £' — 5') is 
reachable from (r', t'). By transitivity of reachability, we have that (r, t' — 5') is reachable 
from (r,f), and by symmetry, that (r, f) is reachable from (r,t' — 5'). It now follows 
that (r,t — 5') is reachable from (r,t) for all 5' < min(S,t). Thus t < t — min(S,t). 
Furthermore, if 5' < min(5, t), then we know that (r, t + 5') is reachable from both (r, £ ) 
and (r,t). It thus follows that (r,t ) is reachable from (r,t). Finally, if £ 7^ 0, then we 
know that (r, t — 5') is reachable from (r, t ) (and hence from (r, £)) for all 8' < min(t , 5). 
But this contradicts our choice of t . Thus t = 0, and (r, 0) is reachable from (r, t). I 

Theorem || now follows as an immediate corollary to Lemma [14] and Proposition [13|. 

We conclude by showing that many practical systems do indeed have temporal im- 
precision (although the S's involved in some cases might be very small). Perhaps through 
statistical data, we can assume that for every communication link I there are known lower 
and upper bounds L\ and Hi respectively on the message delivery time for messages over 
I. We assume that the message delivery time on the link I is always in the open interval 
(L[,Hi). (We take the interval to be open here since it seems reasonable to suppose 
that if the system designer considers it possible that a message will take time T to be 
delivered, then for some sufficiently small 5 > 0, he will also consider it possible that 
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the delivery time is anywhere in the interval (T — 5, T + 5); in this we differ slightly 
from |PHS86| , |HMM85(| .) We define fi to be a message delivery function for link I if 
/; : N — > (Li,Hi). A run r is consistent with f\ if for all n G N, /j(n) is the delivery 
time of the n th message in r on link I. A system R has bounded but uncertain message 
delivery times if for all links I there exist bounds L\ < Hi such that for all runs r G i? and 
all message delivery functions : N — > (Li,Hi), there exists a run r' which is identical 
to r except that message delivery time over the link I is defined by //. More formally, 
r' is consistent with /; and for all i, processor p^ follows the same protocol, wakes up at 
the same time (i.e., t init (pi,r) = £j n jt(pj, r')), and has the same initial state and the same 
clock readings in both r and r'. 

We say R is a system with uncertain start times if there exists 5$ > such that given 
a run r G R, a processor pi, and 5 with < 5 < 5q, there is a run r' which is identical 
to r except that Pi wakes up 5 earlier in r' with its clock readings (if there are clocks 
in the system) shifted back by 5. More formally, for all j ^ i, processor pj follows the 
same protocol, wakes up at the same time, and has the same initial state in both r and 
r' . Moreover, for all k, the delivery time for the k th message on link I (if there is one) is 
the same in both r and r'. All processors other than pi have the same clock readings in 
both r and r' . Processor pi starts 5 later in r' than r, although it has the same initial 
state in both runs, and r(pi, r, t) = r(pi, r', t + 5). 

For any practical system, it seems reasonable to assume that there will be some 
(perhaps very small) uncertainty in start times and, even if message delivery is guaranteed 
within a bounded time, that there is some uncertainty in message delivery time. These 
assumptions are sufficient to guarantee temporal imprecision, as the following result, 
whose proof is a slight modification of a result proved in [pHS86|| on the tightness of 
clock synchronization achievable, shows: 

Proposition 15: A system with bounded but uncertain message delivery times and 
uncertain start times has temporal imprecision. 

Sketch of Proof: Let (r, t) be a point of the system, and let pi be a processor. Let So 
be as in the definition of uncertain start times. Since only a finite number of messages 
are received by time t in r, there is some 5 > such that the delivery times of these 
messages are more than 5 greater than the lower bound for the particular link they were 
sent over, and more than 5 less than the upper bound. Choose 5' < min(5o, S) and some 
processor pi. Let r' be a run in which all processors pj ^ pi start at the same time and 
in the same initial state as in r, have the same clock readings (if there are clocks), and 
all messages between such processors take exactly the same time as in r. In addition, 
processor pi starts 5' time units later in r' than in r, messages to Pi take 8' time units 
longer to be delivered, while messages from pi are delivered 5' time units faster than in r, 
and p^s clock readings (if there are clocks) are shifted by 5'. Such a run r' exists by our 
assumptions. It is not hard to check that run r' has the property that for all times t' < t, 
all processors pj ^ Pi have exactly the same history at time t' in both r and r', while 
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processor p^ has the same history at (r,t') and at (r',t' + 5'). Since (r,t) and Pi were 
chosen arbitrarily, it thus follows that the system has temporal imprecision. | 
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